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TEST NORMS AND SAMPLING THEORY 


FREDERIC M. LORD* 
Educational Testing Service 
Princeton, New Jersey 


Summary 


TEST NORMS usually constitute a set of sam- 
ple statistics (percentiles or percentile ranks) 
used by the test consumer to make inferences re- 
garding a norms populationfrom which the norms 
sample was drawn. In such cases, it is impor - 
tant that the selection of the norms sample and 
the subsequent statistical treatment shall be such 
as to minimize the inevitable sampling errors in 
the published norms table. 

At least in the case of tests usedin our schools 
and colleges, the problems of optimal sampling 
methods and optimal statistical procedures are 
considerably complicated by the fact that whereas 
the published norms table is most commonly con- 
sidered as representing a group of students, the 
sampling unit used in drawing the norms sample 
is usually the school. Since schools usually dif- 
fer markedly from each other in mean score, the 
sampling errors in the final norms table will or- 
dinarily be large unless the number of schools in 
the norms sample is large. The number of stu- 
dents in the norms sample typically has qnly a 
weak and indirect relation to the size of the sam- 
pling errors in the norms table. 

in the most common method of drawing anorms 
sample, all students in each school selected are 
tested and included in the norms, provided they 
are at the proper grade level or levels. This con- 
stitutes simple cluster sampling. In such sam- 
pling, the mean of the norms sample is usually 
a biased estimate of the mean of the norms popu- 
lation. An unbiased estimate can be constructed 
from the sample data (eq. 3), but its sampling er- 
ror (eq. 4) is likely to be so large in comparison 
to that of the mean of the norms sample (eqs. 2, 
2', 2'') as to preclude the use of the unbiased es- 
timate in most situations. 

Numerical examples for two typical situations 
show that under simple cluster sampling, from 
twelve to thirty times as many students must be 
tested in order to obtain the same accuracy for 
the norms table that would be obtained under sim- 
ple random sampling of students. Unfortunately 














simple random sampling of students, without re- 
gard to their ‘‘clustering’’ in schools, is ordinar- 
ily impractical. The best practical procedure to 

avoid the excessive sampling errors characteris- 
tic of cluster sampling is to use two-stage sam- 
pling: a sample of schools is drawn in the usual 

fashion, and then a random sample of students in 

each school is selected for testing. 

At the lower educational levels, two-stage sam- 
pling may often be feasible only when two or more 
tests can be normed simultaneously, in which case 
only a fraction of the students in each classroom 
need take any one of the tests. At the collegelevel, 
it is usually impossible to arrange for the norma- 
tive testing of all students at a givengrade level in 
a variety of institutions. In this situation, the 
common procedure is to test as many students as 
possible in a relatively small number of colleges. 
Not only does this increase the sampling errors 
in the published norms table unnecessarily; it also 
prevents the sample of students from being a ran- 
dom sample, since any sizable group of students 
assembled at any given time of day ordinarily ex- 
cludes those with conflicting scheduled activities. 
In order to obtain a truly representative norms 
sample for colleges, it will ordinarily be neces- 
sary to test so few students in each college that 
some time can be found when all students chosen 
in a given institution can be scheduled for a single 
testing session. 

Three different ways of drawing two-stage sam- 
ples are described. In each case, the question of 
the choice of the sample statistic to be used for 
estimating the mean score inthe norms population 
is discussed, with particular reference to the bias 
and sampling error of each estimate. Standard 
error formulas are derived for three of these es- 
timates (eqs. 13,18,20). Some numerical exam- 
ples are given in Table I, showing the economies 
achieved by using two-stage sampling instead of 
simple cluster sampling, and in Table III illustrat- 
ing the relative efficiency of various two-stage 
sampling methods. 

A simple method is given by means of which 
the same formulas can be used to provide stand- 


*The writer is indebted to Professor Francis Anscombe for helpful comments on a draft of this paper. 
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ard errors for each percentile rank in the norms 

table as well as for the mean of the norms distri- 
bution. A numerical example in Table II indi- 
cates that the advantage of two-stage sampling 

over cluster sampling holds for estimating per- 
centile ranks as well as for estimating arithme- 
tic mean, 

Methods of obtaining school-meannorms from 
two-stage sampling data are briefly discussed. 

It will ordinarily be important to stratify the 
norms population on certain school char acteris- 
tics related to test score, such as geographical 
region and type of support (public, private, or 
church), Two-stage sampling may then be car- 
ried out within each stratum separately. Samp - 
ling errors in the norms can be reduced by strat- 
ifying on school size, even when size is unrelated 
to test score. Since it is difficult or impractical 
to stratify on more than a few dimensions simul- 
taneously, it will often be preferable to deal with 
school size by one of the two-stage sampling meth- 
ods discussed rather than by stratifying on size. 


Introduction 


If the consumer of test scores is familiar with 
the items in the test administered, it may be 
meaningful to him to interpret the raw score of a 
single examinee as representing the achievement 


of answering a certain number of these items cor- 
rectly. More commonly, however, the interpre- 
tation of an examinee’s test score is normative, 
i.e., his test performance is interpreted by com- 
paring it with the test performance of a more or 
less well defined group, called the norms group. 

It is desirable that a set of test norms should 
be based on a group that(i) can be easily defini- 
tively described, (ii) is familiar to the consumer 
of test scores, (iii) includes as one of its members 
the examinee whose test score is to be interpre- 
ted. The norms group may itself satisfy the fore- 
going conditions, or it may be a moreor less rep- 
resentative sample from a larger norms popula- 
tion that satisfies these conditions. 

If an appropriate norms population is small 
(for example, if it consists of all seniors gradu- 
ating from High School A in June 1957), thenorms 
group may conveniently coincide with the norms 
population, and there is no sampling problem to 
be discussed. Frequently, however, those norms 
populations that are small enough to be tested ex- 
haustively by the test author or publisher lack one 
or more of the properties needed for useful score 
interpretation. In this situation, it would be de- 
sirable for the norms group to be a representative 
sample of some larger norms population. 

Various methods of choosing such a sample 
and their effects on the published norms will be 
discussed here. Certain of these sampling meth- 
ods will lead to norms that are both more eco- 
nomical and more representative than most of 





those in current use. 

Most nationally distributed tests are provided 
with some sort of ‘‘national’? norms. A nation- 
wide norms population has certain advantages: (i) 
the population can be readily and clearly defined 
for the test-score consumer; (ii) most examinees 
tested are likely to be members of this group, or 
of some presumably very similar group following 
a year or two later; (iii) such a group is roughly 
equally appropriate for publishers in various parts 
of the country, thus in theory making test norms 
comparable from publisher to publisher. 

One main trouble withusing a nationwide group 
as a norms population is that in most cases it is 
utterly impossible at present to test a representa- 
tive sample of such a group. Many schools, for 
example, are simply unwilling to test. In this 
case, the norms population can, at best, only be 
described as consisting of ‘‘schools that are will- 
ing to test.’ This description, moreover, is a very 
inadequate one, since the willingness of a school 
to test varies widely depending on the demands 
made and on the inducements offered. 

The disadvantages of a ‘‘national’’ norms popu- 
lation lead many thoughtful persons to urge that 
the use of such populations should be abandoned. 
It seems likely to the writer, however, that the 
future will bring increasingly effective attempts to 
obtain representative national norms, at least for 
a few of the more important tests. 

The present paper deals explicitly with certain 
sampling problems that arise when an attempt is 
made to obtain aptitude or achievement test norms 
representative of students in those schools through- 
out the nation that are ‘‘willing to test.’’ Most of 
the principles to be discussed will have obvious 
applications in obtaining norms for other popula- 
tions. 


Sampling Fluctuations in a Simple Case 





Before discussing the sampling problems that 
arise in obtaining ‘‘national’’ norms, it will be 
helpful to consider a simpler situation. Suppose 
that ina large university it is decided to adminis- 
ter a certaintest to a random sample of freshmen; 
and let it be assumed, for convenience, that the 
sample is small enough and the total number of 
freshmen is large enough so that the usual formu- 
las for sampling from an infinite population may 
be applied. 

The ‘‘norms”’ thus obtained for the freshmen 
at this university will consist of a table showing 
the percentile rank for each test score, or alter- 
natively, showing the percentile (i.e., the test 
score) corresponding to each percentile rank. The 
norms table obtained from a random sample of 
freshmen will differ, because of sampling flucua- 
tions, from the table that would have resulted if 
all freshmen had been tested. Each number in 
the body of the norms table has a standard error 
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that indicates the amount of this sampling fluctua- 
tion. 

Two different kinds of standard errors maybe 
thought of, corresponding to the different ways of 
organizing the norms table. Suppose, for exam- 
ple, that if all freshmen are tested, 50 percent 
will obtain a raw score below 20. If there are M = 
100 freshmen in the sample tested, the usual bi- 
nomial formula gives the standard error of the 
percentage as V ¢ 5)(. 5)/100 = .05. This standard 
error indicates that if many different random 
samples of 100 freshmen were to be obtained at 
this university, about two-thirds of the resulting 
norms tables would assign a percentile rank be- 
tween 45 and 55 to a test score of 20, and about 
95 percent of these norms tables would assign a 
percentile rank between 40 and 60 to this score. 

The other type of standard error is expressed 
in the units of the score scale. Although these 
units can certainly not be considered as strictly 
equal, they are usually considered to be more 
nearly equal than are the units between succes- 
sive percentile ranks. To obtain an example of 
this second kind of standard error, suppose that 
the standard deviation of the scores of all fresh- 
men would be Sx = 10 if all were tested. In sam- 
ples of M = 100 from sucha population, the sam- 
ple mean will have a standard error of S.E.z = 
S,/VM=1 . From this it is concluded that the 
sample mean will fall within one score point of the 
population mean in two-thirds of the samples, and 
within two score points of the population mean in 
95 percent of the samples. 

Although means are often presented in conjunc- 
tion with norms tables, the norms table itself us- 
ually gives only the fiftieth percentile or median. 
Assuming a normal distribution, it is known that 
the standard error of a median is 25 percent larg- 
er than the standard error of the mean. The 
standard error of other percentile points can be 
obtained from the standard error of the mean by 
multiplying by the factor Vp(1 - p)/z [cf. 7, eq. 
16.1], where p is the proportion of cases above 
or below the percentile point, and z is the corre- 
sponding ordinate of the normal curve. It is thus 
found that in the present numerical example the 
standard error for the median is 1.25 score points; 
for the first and third quartiles, 1.36 points; for 
the first and ninth deciles, 1.71 points, for the 
fifth and ninety-fifth percentiles, 2.11 points; and 
for the first and ninety-ninth percentiles, 3.74 
points. 

In what follows, attention will at first be re- 
stricted to the standard error ofthemean. A sim- 
ple method for obtaining the standard error of any 








percentile rank will be described in a later sec- 
tion. 


Cluster Sampling 





From a logical point of view, the fundamental 
unit for norms purposes ordinarily is the individu- 
al student (an exceptionis ‘‘school-meannorms, ”’ 
which will be discussed ina later section). In 
practice, however, the unit of sampling used in 
the development of national norms ordinarily has 
been the school, not the individual student. The 
author or publisher typically attempts to obtain a 
‘‘representative’’ set of schools; he then tests all 
students at a given gradelevel, or inagivencourse 
in those schools. ; 

This type of sampling is called cluster sam- 

ling. It is in many cases a natural and reason- 
able way to obtain a norms sample, especially at 
the elementary and secondary school levels, where 
it is likely to be easier for a school to test all stu- 
dents in a given grade or course than to test some 
fraction of these students. The fact is frequently 
overlooked, however, that the eliminationoflarge 
sampling fluctuations from the norms table re- 
quires not that a large number of students shall 
have been tested, but that a relatively large num- 
ber of schools shall be represented in the norms. 
The exact situation can be made clear by giving 
the formula for the standard error of the mean 
score in cluster sampling and by a numerical ex- 
ample. 

If a random sample of n schools is drawn at ran- 
dom from a population of schools willing to test, 
and if all Mj (i= 1,...,mn) students inschool i who 
meet certain conditions (who, for example, are at 
a given grade level)are tested, then the mean of 
the resulting norms distribution is exactly equal 
to the weighted average of the school means, the 
weights being the frequencies Mj: 





= 
xl 


Mm um 
“ws ie I 


— 


= 


where y' is the quantity defined by (1), being equal 
in the present case to the mean of the norms dis- 
tribution, and Yj is the mean score of the students 
tested at school i.* The quantity Mj will be re- 
ferred to as the ‘‘size of school i,’’ although strict- 
ly speaking it is usually the size of a single grade 
at school i. _ 
In order to write down the standard error of y’, 


*A summary of notation is given for easy reference at the end of this article. In general, capital letters 
are used for population values, small letters for sample values. The students ina given grade in 
school i are for this purpose considered to constitute a (finite) population. 
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certain relevant details must first be specified: 


1. It will be assumed that the total population 
of schools is large compared to the number 
of schools in the sample randomly drawn 
from it. 

2. Each school in the population has an equal 
chance of being included in the norms sam- 
ple. As will be seenlater, this is not al - 
ways the best way to sample, but it is a na- 
tural way, since under this procedure each 
student in the schools in the population has 
an equal chance of being included, 

. The number of schools to be included in the 
sample is specified in advance. (An alter- 
native procedure would be to require that 
the sampling should continue until a prede- 
termined number of students had been ob- 
tained. ) 


The order of magnitude of the sampling vari- 
ance of y' in equation 1 is most readily seen by 
examining the special case where all schools are 
the same size. In this special case (1) becomes 


(1") 


so that the mean of the norms distribution is mere- 
ly the arithmetic mean of theschool means. The 
usual formula for the standard error of an arith- 
metic mean applies, so that for this special case 


. _ 
S.E.(y') = = S(¥j)_ , (2') 
vn 


where S(Yj) is simply the standard deviation of 

the school means in the norms population. For 

computational purposes, both in (2') and in subse- 
quent standard error formulas, unknown popula- 
tion values such as S(Yj) may be replaced by the 

corresponding statistics computed from the norms 
sample at hand. It is apparent from (2') that the 
sampling flucuations in the usual norms table de- 
pend primarily on n, the number of schools in the 

norms sample, rather than on the number of stu- 
dents. 

The fact that schools actually are very far from 
being of equal size increases the standard error 
above that in (2'). When schools are of unequal 
size, bothnumerator and denominator in equation 
1 are subject to sampling fluctuations. There are 
formulas readily available insampling theory that 
could be applied here if it could be assumed that 
in the population the standard deviation of the 
school size (Sy) is small compared to the mean 
size (M) | 1:251]. However, data available to the 
writer for grade 6 ina sample of 1324 schools 
throughout the nation, excluding schools with less 














than 10 sixth-grade students, give the estimated 
values Sy = 32, M = 58; similar data for 426 tenth 

grades shows Sy = 91, M = 108; similar data for 
208 thirteenth grades shows Sy = 700, M= 531. 
Although these numerical results cannot be taken 

as exact and final, they are sufficient to rule out 
the assumption that variation in school size is 

small compared to its average value. 

If Mj and Yj are considered to be two stochastic 
variables, a numerical value of each being associ- 
ated with each school, then y' can be written as a 
function of moments of these variables: 


y' = mii/Mjp, 


where m' is the usual symbol for a raw sample 
moment. Application of the usual methods for ob- 
taining | ar ge-sample standard errors [ 4: 208 ff] 
then yields the following approximate result (pre- 
sented here for completeness; the reader may 
wish to skip it): 


— ‘ 
SmS (Yj) 


S. E. 2(¥") = S* (Yj) [ + 


Bm. - 3p" Ca, - 
— 7 re ws p 
Gar ” 


where Cy = SM/M is the coef ficient of variation 
for school size, @ is the population correlation be- 
tween school size (M) and average school achieve- 


— N — 
ment (Yj), pee = 2, - M)*(¥; - py, )* IN is a 
l= 
fourth-degree moment for the N schools inthe pop- 
N —- 
ulation, pz = £(Mj; - M)(¥j - pyi)?/N, and pa, = 


NiMy - M)*(¥j - po )/N. It may be noted that this 
and the other standard error formulas given here 
make no assumption of normality. = 

For the special case when Cy = SmM/M = 0, i.e., 
when all schools are the same size, equation2 re- 
duces to equation 2", as it should. 

A good approximation to (2) can be obtained in 
most cases by assuming that observed school 
achievement is totally independent of school size. 
Mollenkopf [5], using four achievement tests and 
three aptitude tests, found a median correlation 
of .00 between school size and mean score in 99 
ninth grades and a median correlation of .01 in 
106 twelfth grades. Zero correlation is not the 
same as total independence, but the latter may be 
assumed for the purpose of getting a convenient 
approximation to the desired standard error. Un- 
der this assumption, p22 = (Mj - M’ Z(Yj - 
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wor) /N = SiS’ (¥j), Mer = Miz = 0, and (2) can 
e written 


S.E.?(j") =28*(¥i)(1 + CM) (2"') 


A numerical example will throw some further 
light on the general magnitude of this standard er- 
ror. 

For the sake of the present and subsequent nu- 
merical examples, it will be convenient to as- 
sume that the test scores are expressed ona 
standardized scale with a mean of Y = 50 anda 
standard deviation of Sy = 10 for the total popula- 
tion of students. An examination of norms tables 
and other data several different aptitude and 
achievement tests at grade levels ranging from 
the fourth to the fourteenth has shown the stand- 
ard deviation of school means to range from around 
three-tenths up to around six-tenths of the stand- 
ard deviation of all students’ scores. Hence, the 
standard deviation of school means will here be 
taken as four points [ S(Yj) = 4]. If Cm is taken 
to be 32/58 = .55, as in the sixth-grade data men- 
tioned above, the standard error of the mean of a 
norms table based onacluster sample of (say) n = 
36 schools would, according to equation 2", be ap- 
proximately 4v (1 + .552)/36 = .76. In compari- 
son, the same standard error, 0.76, would apply 
to the mean of a norms table based on a complete- 
ly random sample of 173 students, since for such 
a sample the standard error would be 10/vV 173 = 
0. 76. 

If CM is taken to be 91/108 = .84, as inthe 
tenth-grade data mentioned above, the correspond- 
ing standard error for cluster sampling would be 
approximately 4v (1 + .84#)/36 = .87. Thissame 
standard error would apply tothe mean of anorms 
table based on a completely random sample of 
only 132 students. 

For a fixed number of students, the (statisti- 
cal) efficiency of cluster sampling of schools rel - 
ative to random sampling of students is conven- 
iently described by the ratio between the numbers 
of students that would have to be tested in order 
to obtain a given size of standard error under 
each of the two sampling methods. Thus, 2160 
sixth-grade students (60 students in each of 36 
schools) must be tested in order to achieve by 
cluster sampling the same accuracy that could be 
achieved by using a random sample of 173 stu- 
dents; so the efficiency of cluster sampling in this 
case is 173/2160 = .08. In the tenth grade, the 
efficiency of cluster sampling is only .025, indi- 
cating that about forty times as many students 
must be tested whencluster sampling is used than 
would be required under simple random sampling. 
The efficiency at the college level would be even 
lower. 

The foregoing discussion should give a fair 
picture of the relative (statistical) efficiency of 











cluster sampling for norms purposes. Its purpose 
is not to urge that cluster sampling of schools for 

norms purposes be abandoned in favor simple 
random sampling of students, since this is clearly 

administratively infeasible in most school situa- 
tions. Rather, the purpose is to emphasize that 

it is the number. of schools, not the number of stu- 
dents, that determines the reliability of the typical 

norms table. In contrast, it is usually the number 

of students, not the number of schools, that pri- 
marily determines the cost of norminga typical psy- 
chological test. It is thus both economically and 

statistically important to find practical ways of in- 
creasing the number of schools tested while hold- 
ing constant, or possibly decreasing, the total 

number of students tested—thus increasing the ac- 
curacy of the norms without increasing the al- 
ready high cost of obtaining them. 


How Much Sampling Error Is Tolerable 
in a Norms Table? 








Before going ahead to discuss possible improve- 
ments, some consideration may be given to the 
practical effect of the sampling errors of norms 
tables and to the question of reasonable tolerances. 
Is the standard error of .87 found in the tenth- 
grade numerical example small enough tobe toler- 
ated? The answer to this question, of course, de- 
pends on the use to which the norms are being put. 

If the test under discussion has a reliability of 
.91 for the entire norms group, its average stand- 
ard error of measurement is 10V1-.91 = 3.0. If 
the score of an individual student is being interpre- 
ted, an error of .87 in the norms table is small 
compared to the standard error of measurement 
of 3.0. Thus in this particular case, with the ex- 
aminee near the middle of the distribution, sam- 
pling fluctuations in the norms tables will affect 
the interpretation of his test score much less 
than will sampling fluctuations due to unreliability 
of the test. For anexaminee with a score near an 
extreme of the norms distribution, the situation 
is somewhat different, since fluctuations in the 
norms tables are very muchmore severe near the 
extremes, as already discussed in some detail. 

A comparison of the sampling fluctuations in 
norms tables with sampling fluctuations due to test 
unreliability is likely to be misleading. Errors 
of measurement vary at random from one individ- 
ual to the next and hence tend to cancel out in any 
work with group means; the same norms table is 
ordinarily used for all the individuals tested, how- 
ever, and hence errors in the norms tables do not 
cancel out but are repeated time after time. 

Suppose that separate norms are to be present- 
ed for northern and for southern schools; suppose 
that the norms sample consists of 34 schools, 25 
of which are in the north and 9 of which are in the 
south; and suppose that S(Yj) = 4 and Cy = . 84 
within each region. By equation 2"', the standard 
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error of the mean of the norms distribution is 

1.0 for the north and 1,7 for the south. The stand- 
ard error of the median of each norms distribu- 
tion will be about 25 percent larger, or 1.3 and 

2.2 respectively. The standard error of the dif - 
ference between the medians of the two norms ta- 
bles is v(1.3)* + (2.2)* = 2.5 approximately. 

If the true difference between the medians of 
the norms populations is actually 2.5 score points 
in favor of the northern region, there is one chance 
in 6 that the medians of the published norms will 
reverse this direction of relationship. Ifthe true 
difference between northern and southern norms 
populations is 5 score points, there is still one 
chance in 20 that the published norms will present 
the opposite of the correct picture. 

This example illustrates the more dramatic 
type of error that might result from an inadequate 
norms sample. Actually, of course, it is hardly 
sufficient that the medians of the published norms 
for different groups of schools should avoid a re- 
versal of the true relationship. The norms are 
supposed to reflect the true relationship rather 
accurately both in direction and in amount. More- 
over, the norms should reflect the true relation- 
ship not only at the middle of the distribution but 
also near the extremes, where the sampling er- 
rors are much larger. Obviously, 34 schools are 
not nearly enough to produce norms for both north 
and south. 

A similar situation would exist if the attempt 
were made to establish fall and spring national 
norms experimentally, rather than obtaining one 
or the other by interpolation, asis commonly 
done. Since fall and spring norms often differ 
only slightly, their experimental determination to 
a sufficient degree of accuracy would be extreme- 
ly expensive if the fall and spring samples were’ 
drawn independently. In any situation where prac- 
tice effect can be neglected, the standard error of 
the difference between fall and spring norms can 
be greatly reduced by testing the same students on 
both occasions. 

The foregoing considerations do not specify al- 
lowable tolerances for sampling fluctuations in the 
norms. They should, however, indicate some of 
the quantitative considerations that must be thought 
through before the size of the sample to be ob- 
tained for norms purposes can be decided upon. 





Biased and Unbiased Estimates in 
Cluster Sampling 








It is perhaps a surprising fact that the mean 
score of the norms sample is in general a biased 
estimate of the mean score in the norms popula- 
tion. The nature of this bias becomes most appar- 
ent from a consideration of the extreme case 
where the number of schools in the sample is n= 1. 
In this case the mean score in the norms sample 
is simply Yi, the mean score in whatever school 





is randomly chosen from the norms population. 
The expected value of the mean of the norms sam- 
ple over a large number of similar samples is 


o N_ 
B(¥)) = po. = (2 Y/N, 


where N is the number of schools inthe norms pop- 
ulation. Since the mean score in the norms popu- 
lation is 


- N — N 
Y= (=z Mj Y¥j)/ = Mj ’ 
i=1 i=1 


the expected value of the mean of the norms sam- 
ple is not equal to the mean of the norms popula- 
tion unless either (i) all schools are the same size 

or (ii) school size is uncorrelated with school 

achievement. The biasisof order 1/n, and hence 

tends to disappear when the number of schools is 

large. (It can be shown by expanding (1) in a Tay- 
lor’s series and taking expected values that if 

school size were normally distributed (which it is 

not), the bias when nis large would be approxi- 
mately EV' - Y = Cj,S(Y¥j)p/n, where Ey' is the 

expected value of the mean of the norms sample, 
and p is the population correlation between school 

size and average school achievement. ) 

An unbiased estimate of the mean score of the 
norms population can be constructed from the 
same data that were used to construct the biased 
estimate of equation 1. The unbiased estimate is 


n 
vt = (2 Mj)/n i F 3 
y “a y (3) 
(M, the mean school size in the norms population 
is considered as known in advance, sothat it need 
not be estimated from the sample. ) 

Under the assumption that school size and school 
mean score are unrelated inthe norms population, 
the sampling variance of the unbiased estimate 
for large n is found to be approximately 


8.E.?(9") = S.E.2(¥) + Cy ¥* . (4 


It is clear that whenever school size and school 
achievement are uncorrelated the unbiased esti- 
mate has a larger sampling variance than the bi- 
ased estimate of (1), except in the special case 
where Y = 0. This creates an odd situation since 
the choice of the origin for the scale of measure- 
ment used to report test results is an arbitrary 
choice. The standard error of the unbiased esti- 
mate could thus be minimized by scalingtest 
scores so that their mean is near zero. 

In actual practice, the test-score mean will al- 
most always be 15 or more if rawscores are used, 
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and frequently will be 50 or 500 if scaled scores 
are used. With data of this sort, the sampling 
error of the unbiased estimate becomes huge. In 
the case of the numerical example for the sixth 
grade, with Y = 50, the standard error is found 
from (4) to be V. 767 + . 55" x 507736 = 4.16, in 
contrast to a standard error of .76 for the biased 
estimate y'| 

The true nature of the unbiased estimate can 
be clearly seen in the following example where 
the sample consists of only one school. Suppose 
that by chance the school chosen is about half the 
size of the average school in the norms population. 
Then, by equation 3, the unbiased estimate of the 
population mean will be obtained by computing the 
mean score of the students in the school selected 
and then dividing this mean by 24 On the other 
hand, if the school selected at random for the 
sample happens by chance to be about twice the 
size of the average school inthe norms population, 
then the unbiased estimate is obtained by doub - 
ling the mean score for the schoolf Such an esti- 
mate might be a reasonable one if there were 
known to be a high negative correlation between 
school size and school achievement; it is clearly 
an unreasonable one if there is little or no corre- 
lation between school size and achievement, as is 


assumed here. “ 
Since the large standard error of the unbiased 


estimate is clearly intolerable, this estimate will 

be of no further interest to us here. This should 

cause the reader no uneasiness, since unbiased- 
ness is merely an arbitrary requirement. An un- 
biased statistic is defined as one whose arithme - 
tic mean over all samples is equal to the corre- 
sponding population mean. The definition of bias 

is thus based on a choice of the arithmetic mean 

as the appropriate measure of central tendency, 
rather than of some other measure such as the 

median or weighted average. Thischoice isan ar- 
bitrary one, since the arithmetic mean is far 

from being the ‘‘best’’ measure of central ten- 
dency for every kind of situation. 


Two-Stage Sampling 





The fact that the sampling error in the usual 
type of norms sample depends primarily on the 
number of schools tested rather than on the num- 
ber of students tested, as seen in equations 2° 
and 2"', suggests that the sampling errors could 
be reduced without inc reasing the number of stu- 
dents tested, provided arrangements could be 
made to test more schools, with fewer students 
per school. The practical problems involved in 
such a procedure will be briefly discussed before 
going on to the statistical considerations. 

Although it might be possible for many school 
principals to segregate a small number of students 
from the group to which they belong in order to 





administer one or more tests to them for norma- 
tive purposes, such a procedure would probably 
not be advisable at the lower educational levels 
since the fact of selecting and isolating these stu- 
dents from their customary group might cause 
them to give atypical test performances. Such stu- 
dents might, for example, be unusually nervous 
or uncooperative during the testing period. 

Even if the segregation of small numbers of stu- 
dents at the primary or secondary educational lev- 
els is ruled out, the objective of testing more 
schools, with fewer students per school, can still 
be achieved in certain cases where two or more 
tests can be normed simultaneously. If it is pos- 
sible for the tests to be all administered simultan- 
eously in the same examination room, then, if 
there are k tests, each test need be administered 
to a subsample of only 1/k of the students in each 
school. 

A rather different situation pertains at the col- 
lege level. Here it is usually impossible to test 
all students in any one college simultaneously; 
furthermore, it is usually equally impossible, be- 
cause of conflicts in the students’ schedules, to ob- 
tain any sizable random subsample of students who 
can be tested at any one session. The only way a 
really random sample of college students can be 
obtained, therefore, is to work with very small 
subsamples (perhaps from 2 to 10 students) in each 
college. 

At the lower educational levels it is convenient 
for the size of the subsample to be proportional to 
the size of the school from which it is drawn. At 
the college level such a procedure might produce 
excessively large subsamples in the larger col- 
leges; hence it may be most convenient for the 
number of students subsampled from each college 
to be the same irrespective of the size of the col- 
lege. These two situations will be treated sepa- 
rately in the following sections. 

It is, of course, possible to fix the size of the 
subsample in some way s° that it is neither equal 
for all schools nor proportional to the size of the 
school. A general discussion of the almost end- 
less array of possible sampling and estimating 
procedures will be found in texts on techniques 
and theory of sampling | e.g.,1,3]. Most of the 
specialized formulas given below can be derived 
from the very general formulas givenin such texts, 
but the process of simplifying the textbook formu- 
las and then translating them in terms of obs erv- 
able statistics convenient for the present problem 
is in most cases a more formidable task than de- 
riving the desired specialized formulas ab initio, 
as in most cases will be done here. BF Sis ate 

An excellent, very readable and relevant discus- 
sion of general sampling principles is given in [ 2]. 
In [6], Peaker describes an actual application of 
two-stage sampling procedures for obtaining na- 
tional norms for 15-year-olds on a reading test. 
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Size of Subsample Proportional to 
Size of School 





In previous sections, there has been no sub- 
sampling within each school, and the same sym- 
bol, Mj, has denoted the ‘‘size’’ of school i and 
the number of students sampled from school i, 
since these have been identical quantities. Here- 
after, mj will be used todenote the number of stu - 
dents in the subsample drawn from school i, and 
Mj will denote the total number of students in the 
group in school i from which the subsample of mj 
students is drawn. 

Let us first consider the case where the size 
of the subsample in each school is to be propor- 
tional to the ‘‘size’’ oftheschool, i.e., where mj 
is proportional to Mj. Here, as before, it is nat- 
ural to use the arithmetic mean of the scores of 
students in the norms sample as an estimate of 
the corresponding mean for the norms population. 
This estimate will be denoted by y, its formula 
being the same as that given for ¥' in equation 1 
except that the subsample mean, yj, has been sub- 
stituted for the ‘‘school mean’’ Yj: 


“i ae S43 2 
y = (2Mj yj)/2Mj = (Lmiyj)/omj . (5) 


As before, also, this estimate is biased; however, 
this does not seem to be sufficient reason to dis- 
courage its use. 

Since y results from a two-stage sampling pro- 
cedure, its sampling variance is a resultant of 
sampling fluctuations occurring at each of the two 
stages. { The reader who wishes to skip the deri- 
vation should skip to equation 13 and refer to the 
appendix for the meaning of the sy m bols appear- 
ing there.| The total two-stage sampling vari- 
ance of y (denoted by Var,2 y) can be expressed 
as the sum of two components by means of a very 
general formula: 


Vary = E,(Var,y) + Var,(Ezy) , (6) 


where E denotes the average or ‘‘expected’’ value 
over all possible samples, and the subscripts 1 
and 2 refer to the stage of sampling. _ 

For any given set of n schools, E.y is the ex- 
pected or average value of y over all possible sub - 
samples within the given set of schools. Since 
the values of mj are fixed once the schools are 
chosen, it is obvious that E,y = y'. Thus the 
second term in (6) is the same as the sampling 
variance given by (2). 

In order to get the first term of (6), let m = 


n 
(2mj,)/n, so that 





n 
m =m)VJj = 
Var.) = Var, — a “ag Var,>mijyi, (7) 


the values of mj being fixed. Since the n values 
of yj vary independently, 


sf & = 

Var2y = = =m Var2yj - (8) 
Now Var2yj is simply the usual sampling variance 
of the sample mean obtained when mj Cases are 
drawn without replacement from a finite popula- 
tion of Mj cases: 


—~ Mj - mj 
Var2yji = i, S? 


oi-t 9 ; (9) 


i 


where Sj is Mj/(Mj - 1) times the standard devia- 
tion of the scores of the students in school i, and 
f = mj/Mj is the (constant) subsampling frac tion. 
Thus, 


= Loaf n 2 
Var2y = nh? =mjsji . (10) 


Assuming either (i) that the variance of scores 
within a school is, to an adequate approximation, 
unassociated with size of school, or (ii) that with- 
in-school variance is approximately the same 
from school to school, it follows that 


1-f 
—— (Sj), (11) 


Vare2y - 
n 
where a(Sj) = = 2/n is the arithmetic mean value 
of the within-school variances (Sj) of the schools 
in the sample. 

It remains to find the expected value of the 
quantity in (11) over all possible samples of 
schools. Using again the assumption that the with- 
in-school variance either is unrelated to school 
size or is the same inall schools, it is found that, 
approximately 

1-f 


E,(Varzy) = ap A(S}) (12) 


where A(S;) is the arithmetic mean val ue of the 
within-school variances for the norms population. 
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Thus from (6), (12), and (2"'), 


Var .2y = La A(S;) +i + Cy) S* (Yi) . (13) 


niM 


It is seen that if the subsampling fractionf 
equals 1 (so that, in effect, there is no second- 
stage sampling), the first term on the right side 
of (13) vanishes and (13) reduces to (2"'), as 
would be expected. 

Equation 13 shows how the sampling variance 
of the mean of the norms sample depends both on 
the number of schools in the sample and on the 
proportion of students sampled from each school. 
Anumerical example will help to make clear the 
economies that may be achieved by slightly in- 
creasing the number of schools in the norms sam- 
ple while drastically reducing the number of stu- 
dents tested in each school. 

Inthe numerical example already given for the 
sixth grade, CM = 95, S(¥j) = 4, and M =58. Al- 
so, since Sy = 10 and S(¥j) = 4, the value of ae) 
must be approximately equal to Sy - S$? (Yj) = 
Given these numerical values, the top half of af 
ble I shows for different subsampling proportions 
(f) the number of schools (n) that must be includ- 
ed in the norms sample in order that the sample 
mean shall have the same size of standard error 
that would exist if the usual cluster-sample meth- 
od of norming were used with n = 36 schools, as 
represented in the first line of the table. The next- 
to-last column of the table shows the expected 
number of examinees (nfM) required to obtain this 
degree of accuracy. The last column shows the 
ratio of the number of examinees required under 
the proposed two-stage sampling procedure to the 
number of examinees required by the usual clus- 
ter-sample method. The last line of the sixth- 
grade table shows, for example, that if only 2 per- 
cent of the students are tested in each school, the 
tot al number of students tested need be only 9 per- 
cent as many as would be required by the usual 
norming procedures under which all students are 
tested in any given school. 

The second half of Table I gives similar infor- 
mation for the tenth-grade numerical example; 
here, Cy = .84 and M = 108, the other values be- 
ing as before. The lastcolumnofthe table shows 
evengreater economies resulting from two-stage 
sampling. 


Application of Standard Error Formulas to Per- 
centile Ranks in the Norms Table 








As already pointed out, in the case of simple 
random, sampling from a normal distribution the 
sampling variances of the various quantiles (per- 
centile points) have certain stated ratios to the 
sampling variance of the mean. No simple re- 
sults for the corresponding sampling variances 
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in two-stage sampling are presently available to 

the writer. Although the ratio obtained for the 

sampling variance of the median in the simple 

normal case probably provides a useful approxi - 
mation for many or most practical situations, any 

similar generalization for the more extreme per- 
centile points would be of doubtful value. 

Actually, the most commonuse of norms is not 
to determine the score having a given percentile 
rank, but rather to determine the percentile rank 
of a given score. Fortunately any formula for the 
standard error of the mean of the norms sample 
can, by a simple change in the meaning of certain 
symbols, be used to represent the standard error 
of any percentile rank determined from the norms 
table. 

Let x denote any test score, and let P(x), or 
simply P, denote the proportion of cases in the 
norms popul ation lying at or below a score of x. 
Consider now a set of data, paralleling the test- 
score data, in which every score at or below x has 
been replaced by 1 and every score above x has 
been replaced by 0. P is the arithmetic mean of 
all these 0’s and 1’s for the whole norms popula- 
tion. If the proportion of cases lying at or below 
score x in a norms sample is denoted by p(x), or 
simply by p, this sample estimate of P is simply 
the arithmetic mean of the 0’sand1’s inthe norms 
sample. Thus any formulas for the standard er- 
ror of the mean of a norms sample can be used 
for the standard error of any percentile rank 
simply by replacing y with p and Y with P. 

Equation 13, for example, remains bach: anged 
except that y is replaced with p and S S*(¥j) is re- 
placed by S?(Pj), where Pj is the propo r tion of 
cases in school i lying at or below x and S*(Pj) is 
the variance of these proportions. The quantity 
Sf, the within-school score variance, must now be 
computed from the 0’s and 1’sinschool i; the com- 
putation may be simplified by using the familiar 
formula Sf = Pj(1 - Pj). With these modifica- 
tions, equation 13 is ready tobe applied for deter- 
mining the standard error of a percentile rank in 
a sample norms table: 


= l- 
Vari2P = Sa 


ia + Chq)S? (Pi) 


It is obvious that the more the schools in any 
particular norms population differ from each 
other in the values of Pj the more the advantage 
to be gained by subsampling within schools. Some 
idea of the size of the relevant standard errors 
can be obtained from a numerical example based 
on actual norms data. 

The first line of Table II was computed from 
the test scores of all tenth-grade students in a 35- 
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TABLE I 


THE STANDARD ERROR OF THE PERCENTILE RANK OF SCORES NEARSELECTED 
POPULATION PERCENTILE POINTS FOR VARIOUS SUBSAMPLING RATIOS 





Standard Error of Percentile Rank 
Near Population 





Number of 
Subsampling schools in 50th 75th 90th 
ratio (f) sample (n) percentile percentile percentile 





35 . 041 . 030 . 015 
70 . 029 . 022 . O11 
350 .015 .011 . 006 


3500 . 009 . 007 . 005 





*This row represents the usual type of simple cluster sampling. 


TABLE Il 


A COMPARISON OF THE STANDARD ERROR OF THE MEAN OBTAINED BY THREE 
METHODS OF SAMPLING COLLEGE FRESHMEN (n = 100) 





Expected Number of Standard Error of Mean 
number of Subsampling examinees ; 
examinees proportion _ per school Equations: 
tested (nfM) (f) (m or f{M)* (13) (18) (20) ** 








53, 100 1. 00 531 . 66 . 66 . 40 
26,550 .50 266 . 66 . 67 . 40 


13,275 .25 133 . 66 . 67 41 


5,310 . 10 53 . 67 . 69 . 42 


2,655 . 05 27 . 68 . 72 . 44 
1, 062 . 02 11 . 72 . 81 . 49 
531 .01 5 .77 . 93 . 56 





* In the case where the size of subsample is proportional to school size (13), this column 
gives {M, the average number of examinees per school. 

**Assuming school size to be unrelated to school achievement and to within-school vari- 
ance. 
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school norms sample. Equation 2"' was used 
with p and P substituted for y and Y, and with 
sample statistics substituted for the correspond - 
ing unknown population values. In this particu 
set of data, the total number of students in we 
sample was 3748, the average school size was 
107, the coefficient of variation for school size 
was .88, and the correlation between school size 
and school mean score was 0. 18. 

It should be noted that Table II gives the stand- 
ard errors of percentages, not of percentile points. 
The standard error of the mean ofa norms distri - 
bution is expressed in test-score units, the stand- 
ard errors in Table Il are not. The table states, 
for example, that in simple cluster sampling the 
proportion of cases in the norms sample lying be- 
low the population median will fluctuate from sam- 
ple to sample with a standard deviation of .041; 
i.e., the population median score would have an 
observed percentile rank between 45.9 and 54.1 
in about two-thirds of all similar norms samples 
that might be drawn. 

All four lines of Table Il refer to sampling 
from the same norms population. The last three 
lines relate to two-stage sampling with three dif- 
ferent subs ampling ratios, the standard errors 
being computed from equation 14. The total num- 
ber of students in the norms sample is the same 
throughout the table; for example, the second line 
presents standard errors for the situation where 
only half the students in each school are tested 
but twice as many schools are included in the sam- 
ple. It is seen that a change in the subsampling 
ratio produces a re] ative change in the standard 
error that, with minor exceptions, is about the 
same at all percentile points. 


Two-Stage Sampling with Size of 
Subsample Fixed 








Up to now, the main consideration has been 
with the situation where the size of the subsample 
(mj) is proportionate to the size of the school 
(Mj). This procedure conveniently gives the 
larger schools a weight appropriate for their size. 
As already pointed out, in certain situations, es- 
pecially in testing college students for norms pur- 
poses, it may be administratively quite impossi- 
ble to obtain truly random subsamples if the sub- 

“sample size must be proportional to size of the 
college. It is helpful, therefore, to consider the 
case where the size of thesubsampleis the same 
for all subsamples. The more general case where 
the size of subsample may be arbitrarily varied 
from college to college will not be treated here. 
The interested reader can find the necessary 


basic theory in appropriate texts | e.g., 1, Ch. 11]. 


Two possible ways of giving proper weight to 
the larger schools suggest themselves. These 
will be treated separately. 

The necessary notation is the same as before 





except that mj is constant and the subscript i will 
consequently be dropped from this symbol. 


Weights Applied Computationally After 
Collection of Data 








When the schools inthe sample have been drawn 
at random, the weighted mean 


(15) 


provides an acceptable estimate of the mean score 

in the norms population. The formula for yr" is 

the same as theformulafory given previously; in 

the present case, since size of subsample is not 

proportionate to size of school, equation 15 repre- 
sents a weighted rather than an unweighted mean 

of the scores of the students in the sample. Equa- 
tion 15 assigns appropriate weights to the various 
values of yj according tothe size of the school that 

each mean represents. 

These weights (Mj) are not, in general, the 
weights that will minimize the sampling variance 
of the estimate. It may happen, in fact, that the 
unweighted mean of the yj will have a smaller sam- 
pling variance than will the weighted mean yy. 
Both the weighted and the unweighted means of the 
yj are, in general, biased. The unweighted mean, 
however, has an additional disadvantage so ser- 
ious as to prevent its use for many or most pur- 
poses and to rule it out of further consideration 
here. The unweighted mean is not only a biased 
estimate, it is an inconsistent estimate; i.e., its 
value does not, in general, approach that of the 
population mean when the sample is made large. 
(An exception would arise if it were known with 
certainty that school size and school achievement 
were unrelated in the population, since in this 
case there would be no advantage in weighting the 
subsample mean according to school size. The as- 
sumption that school achievement is unrelated to 
school size is often adequate for providing a con- 
venient approximation to the sampling error of a 
sample estimate. The order of approximation re- 
quired for such a purpose is quite different from 
that required for published testnorms, so this as- 
sumption cannot in general be brought forward to 
justify the use of the unweighted mean here.) 

It remains to obtain a standard error formula 
for yy' , which can be found, very muchas before, 
from equation 6. Again, it is apparent that E2yy 
=y'. The equation corresponding to (10) is read- 
ily found to be 


n 2 
=(Mj-m)MjSj; , (16) 


Var. yy" = mn?a2(Mi) (Mj) 


n 
where a (Mj) = ({Mj)/nis the average size of the 
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schools in the sample. Equation 16 simplifies to 


Varejp = — a(S?) [(1 + c}4) - ay} , 2? 


where Cy = S(Mj)/a(Mj) is the coefficient of vari- 
ation of the Mj computed for the sample. 
Finally, 


» 3} 
Varis¥p = pa A(S;)[1+CM “= 


+ 2 s*(¥)(1 +C2) , 


approximately. 
If schools do not vary much in size, 


Variaip = <= (1 - f)A(S{) + + S°(¥), (18°) 


Since there is no difference whatever between yy 
and y when all schools are of the same size, a 
check is provided by the fact that (13) and (18') 
are equivalent when Cy = 0. It is seen that when 
the schools are not all the same size, (18) is larg- 
er than (13) for afixed total number of examinees 
tested; hence the present method is inferior to 
choosing size of subsample proportional to size 
of school whenever the latter is administratively 
feasible. 

The foregoing results can be extended beyond 
the problem of estimating the mean score of the 
norms population to cover all the percentile ranks 
in the norms table. Just as the unweighted mean 
in the present situation does not in general pro- 
vide an acceptable estimate of the mean of the pop- 
ulation, so will a simple frequency distribution 
of the scores in the norms sample fail to provide 
adequate estimates of the population percentile 
ranks. If the norms table is toavoid being biased 
even in large samples, the results from each 
school must be weighted according to the size of 
the school. In preparing a frequency distribution 
for norms purposes, this can be achieved by 
counting each student in thenorms sample as if 
he were actually Mj students. When this is done, 
equation 18 may be used to give the sampling var- 
iances of the percentile ranks appearing in the 
sample norms table, the method for doing this be- 
ing the same as that discussed in the previous 
section. 


Sampling with Probability Proportional to Size 





When the size of thesubsampleis constant for 
all schools, extra weight may be giventothe larg- 
er schools by weighting the data from these 





schools as part of the computations, as discussed 

in the previous section. An alternative is to incor- 
porate the desired weights into the sampling pro- 
cess, eliminating any necessity for computational 

weighting of the sample data after it has been col- 
lected. 

All sampling methods considered up to this 
point start by drawing a simple random sample of 
schools from the population of schools, each school 
having an equal opportunity to appear in the sample. 
It may be noted incidentally that not only do the 
schools have an equal opportunity to appear in the 
norms sample, but also, in the case of simple 
cluster sampling and in thecase of two-stage sam- 
pling with size of subsample proportional to size 
of school, each student in the norms population 
has an equal probability of being included in the 
norms sample. Under the procedure to be consid- 
ered in the present subsection, the selection of 
schools is so carried out that the probability of 
any school appearing in the norms sample is pro- 
portional to the size of the school; once the sample 
of schools has been selected in this way, the sub- 
sampling process is carried out as in the previous 
section, m students being selected at random from 
each school. The method of the immediately pre- 
ceding subsection will be referred to as the meth- 
od of weighted averages; the methodof the present 
subsection will be referred to as the method of 
weighted sampling. 

In the present method, weights are given to the 
schools during the sampling process; hence an ob- 
vious sample statistic to use to estimate the mean 
score of the norms population is the unweighted 
sample mean. 








(19) 


When used with the method of weighted sampling, 
y] gives an unbiased estimate of the population 
mean score. 

It remains to determine the sampling variance 
of this statistic. As a first step, it is helpful to 
transform the problem into one involving only sim- 
ple random sampling. This is achieved by noting 
that the effect of applying the present pps (proba- 
bility proportional to size) sampling method toa 
given population is effectively the same as if an or- 
dinary random sampling process had been applied 
to a population differing from the given population 
in certain specified ways. Specifically, if schools 
of size Mj occur with frequency fj in a given 
norms population, a pps sample of schools drawn 
from this population is effectively the same as if 
it had been drawn by simple random sampling 
from a population in which schools of size Mj oc- 
cur with frequency proportional to fjMj. In order 
to obtain thé sampling variance of jy, it is mere- 
ly necessary to apply equation 6, computing the 
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necessary population means and variances from 
the modified population rather than from the ac- 
tual norms population. 

It is quickly found that 


7 * eo 
Var ay = aay (S70 -B) +28°(%) , (20) 


where the ‘‘stars’’ indicate that the population sta- 
tistics are computedfrom the modified population 
described in the preceding paragraph. Formulas 
for A(Sj) and S*(¥j) in terms of the unmodified 
population values are: 


7 oe 
A(S; ) 


So = 
$?(¥\) = = 


If school size is totally unrelated to school 
achievement qnd to withjn-school variance, then 
the values of A(Sf) and S*(Yj) will be the same for 
the modified population as for the actual popula- 
tion, and the asterisks may be dropped from (20). 
If f in (13) is replaced by m/M, it is seen that in 
this special case the method in which size of sub- 
sample is proportional to size of school yields an 
estimate having a larger sampling variance than 
does the present method (equation 20), the differ- 
ence between the two variances being CiyS"(¥j) n. 
In this same special case, the sampling variance 
of the estimate obtained by the method of weight- 
edaverages, as given in equation 18, isalso cl ear- 
ly greater than that of the estimate obtained by the 
present method; if the subsampling fraction is 
small, (18) is greater by a factor of approximate- 
ly (1 + Chy). 

The comparison between the three methods is 
illustrated by Table III, whichis based on college- 
freshmen data with n= 100, M = 531, Sy = 700, 
S?(¥j) = 16, A(S?) = 84. Itis seen that for college 
freshmen, where schools vary greatly in size, 
the method of weighted sampling shows great su- 
periority over the other methods, at least under 
the assumptions stated inthe footnote to the table. 
The table also shows how the number of exami- 
nees can be greatly reduced by subsampling with- 
out seriously affecting the standard error of the 
norms. 

Although the foregoing provides a good basis 
for preferring the present method to the other 
methods, the situationis far from clear cut. The 
modified distribution of schools usually differs 
very drastically from _ the actual distribution of 
schools. The values of A(S?) and of S* (¥j) for 
the modified distribution are determined primar- 





ily by the large schools; if the large schools tend 

to have higher within-school variances or if they 

tend to differ more from each other in average 

achievement than do the smaller schools, the sam- 
pling variance of (20) may become larger than that 

of (13) or (18). The computational weights and 

sampling probabilities that will together produce 

sample estimates with minimum standard errors 

have been determined theoretically, but these the- 
oretical results cannot be applied in practice with- 
out detailed a priori knowledge of the populatian 

to be sampled. 

There is a practical difficulty that may arise 
during the sampling process in the method of 
weighted sampling whenever the schools in the pop- 
ulation differ widely in size. Suppose, as may 
well be the case, that some schoolsare 100 times 
as large as are other schools that occur equally 
frequently in the population. A pps sample should 
then contain about 100 times as many of the form- 
er as of the latter schools. The total norms popu- 
lation, however, might contain only one or two of 
the larger schools. 

In such a case, pps sampling is possible only if 
the first-stage sampling is done with replacement. 
This procedure is quite satisfactory statistically, 
but it is not always administratively feasible to 
test a large number of students in a single school, 
as has been pointed out previously. 

When the schools are sampled with replacement, 
so that a single school may be drawn several times, 
a new subsample must be taken from the school 
each time the school is drawn. Equation 20 will 
still apply, providing second and subsequent s ub- 
samples are allowed to overlap the first, so that 
the same student may appear in more than one sub- 
sample. It would be more efficient, however, to 
carry out the subsampling without replacement of 
students, in which case the sampling error would 
be less than that given by (20). 

It is obviously possible to make a compromise 
between the method of weighted sampling andthe 
method of weighted averages. The weighting can 
be done during the sampling process insofar as is 
convenient. Further weighting can then be effect- 
ed computationally by applying appropriate numer- 
ical weights to the sample data collected. Sam- 
pling variances appropriate for such procedures 
may be obtained by consulting the references. 

The method of weighted sampling as described 
requires that school size be known at the time the 
sample is drawn. If school size is not known ex- 
actly but some estimate of the size of each school 
is available, these estimates may be used instead 
of the true values. The effect of this procedure 
on the standard error of the resulting statistic can 
also be determined by formulas given in references. 


School-Mean Norms 


Most test publishers provide no appropriate 
norms for the use of administrators who are inter- 
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ested in the meanscore ofa school rather than in 

the score of any individual. Norms appropriate 
for him should represent the frequency dis tribu- 
tion of school means. 

The two-stage sampling procedures recom- 
mended in the preceding sections have the disad- 
vantage that the data obtained do not provide the 
exact mean score for any school, but only fora 
sample of the students in that school. This disad- 
vantage, at least in theory, is more than compen- 
sated for by the fact that many more schools are 
represented in the sample data obtained. Some 
convenient method is needed, however, for recon- 
structing the school-mean norms from the sam- 
ple data. 

s*(¥;), the actual variance of the n sample 
means, is a sample estimate of the corres pond- 
ing standard error S.E.*(¥j). This standard er- 
ror is made up of two independent sources of fluc- 
tuations (cf. eq. 6): 


— | 
S.E.°(¥j) = S(¥) + 7 2 SEG) , (23) 


where S. E.; (Yj), the sampling variance for the 
mean of the subsample drawn from school i, is 
the usual variance in sampling from a finite popu- 
lation: 


Mj - mj 


2 
“7M, Sj . (24) 


S.E.; (i) = 


A sample estimate of S*(¥j) is obtained from 
(23) and (24) by substituting sample estimates for 
the other two variances: 


1N Mj-mj 2 


$*(¥j) = s* (yj) a n z “mim, Si (25) 


Equation 25 is an obvious extension of the usual 
rationale and methods of variance-components an- 
alysis. In general, equation 25 should provide 
a better estimate of the true variance of the school- 
mean distribution than would have been obtained 
by the usual methods, since these use much fewer 
schools. 

If the estimated variance S* (Yj) does not differ 
too much from the sample value s*(¥j), it can of- 
ten be assumed that the shape of the frequency dis- 
tribution of school means would not differ too 
much from the shape of the frequency distribution 
of sample means except for achange in scale cor- 
responding to the change in variance. If neces- 
sary, the logic by which equation 25 was devel- 
oped could be extended to provide estimates of 
the higher moments of the frequency distribution 
of school means and the shape of this frequency 
distribution could be -determined from these esti- 





mated moments. If only the variance of the school - 
mean distribution is to be estimated, no more than 
two students need be tested in each school; larger 
samples from each school would be necessary if 
the higher moments were required. 


Stratified Sampling 





In stratified sampling, each stratum is repre- 
sented in the sample by a predetermined number 
of cases. This distinguishes it from cluster sam- 
pling, since the presence or absence of any clus- 
ter in the sample is a matter of chance. 

Up to this point, no mention has been made of 
the unquestionable desirability of stratifying the 
schools in the norms population before the sample 
of schools is drawn. Any of the methods of sam- 
pling already discussed can then be app|ied sepa- 
rately to each of the resulting strata. 

In the case of a nationwide norms population, it 
will almost surely be important to set up two or 
more strata based upon geographical location. Ad- 
ditional strata at the college level might be (i) jun- 
ior colleges and senior colleges; (ii) male, female, 
and coeducational colleges; (iii) liberal arts col - 
leges, teachers colleges, etc.; (iv) public, private, 
and church institutions. Ideally all of these dimen- 
sions should be taken into account simultaneously; 
for example, one stratum should consist of south- 
ern junior coeducational private liberal arts col- 
leges. 

The foregoing are plausible dimensions for use 
as bases for stratification because each seems 
likely to be related to test Score. On the other 
hand, school sizé may be a useful basis for strati- 
fication even when it is totally unrelated to test 
score. This last is suggested by the fact that the 
standard errors represented by formulas 2, 13, 
and 18 would all be reduced if Cy, the coefficient 
of variation for school size, were equal to zero. 

If schools were rigorously stratified onsize, it 
would make no difference which of the last three 
sampling methods discussed is applied within each 
stratum, since y, yy’, and yy are identical when- 
ever all schools are the same size (y' is a special 
case of y, occurring when the subsampling frac - 
tion is 1). Furthermore, the sample mean score 
within stratum h (yp, say) would be an unbiased es- 
timate of the population mean scorefor stratum h. 
Hence, an unbiased estimate of Y, the general pop- 
ulation mean, would be given by the sample statis- 
tic 


> MpNhYh 
=MpN, ’ 
h hh 
where Mp is the size of each school in stratum h 


and Np is the number of schools in stratum h. 
It is easily seen that 
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= 1 - ge2 a2 = 
Vi = 7. = M,N: Var - (27 
ar Y (EMNi)™ h h-h hYh ) 


If only a small proportion (gp) of the schools in 
each stratum are usedin the sample, then Varpyp, 
can be obtained from (13), (18), or (20) by setting 
all Mj equal and thus setting Cy = 0, the result 
being 


= l < fh 2 1 2 
. =_———— A ‘ - § i . 28 
Varhyh ile n(Sf) + ah nt ¥)) (28) 


where np, fh, Ah, and Sp are the values of n,f,A, © 
and S for stratum h. If gp isnotsmall, the appro- 
priate modification of (28) is 


fh 


- 1 - = 
Varnin = arom, An(si) + gh sh(¥;). (29) 


= 
Nhth Nh 

The standard errors given by (28) and (29) are 
generally less than those given by (13), (18), or 
(20) because now Cy = 0. The standard error of 
the estimate obtained in stratified sampling is‘seen 
by (27) to be a weighted average of the standard 
errors relating to the separate strata, as given 
by (28) and (29). In most cases, therefore, the 
standard error of the estimate of the norms will 
be reduced by stratifying on school size before 
drawing a sample of schools. 

It will often be the case that there are several 
dimensions, such as geographical region and type 
of support (public or private), on which the popu- 
lation should be stratified, because these dimen- 
sions are correlated with test score. Since it is 
difficult and often impossible to stratify properly 
on more than two or three dimensions sim ultane- 
ously, it will frequently be more satisfactory to 
deal with school size by the method of weighted 
sampling rather than by attempting to stratify on 
size. 


Optimum Sampling Procedures 





Formulas are available for determining opti- 
mum size of sample and optimum size of subsam- 
ple for all the methods discussed, as well as for de- 
termining optimum weights for the method of weight- 
ed averages and optimum sampling probabilities 
for the method of weighted sampling. Such formu- 
las usually require some knowledge of the relation 
between school size, school mean score, and with- 
in-school variance for the schools in the norms 
population. In addition, these formulas require de- 
tailed information (or estimates) as to the econom - 
ic costs characterizing each type of sampling and 
computational activity involved in norming. It is 
necessary to know, for example, how costs are af- 
fected by increasing the number of schools while si- 
multaneously decreasing the total number of stu- 





dents tested. 

The possibility and theoretical desirability of 
multistage sampling procedures will also have sug- 
gested itself tothe reader. The school system, 
the single school, the single class within a school 
all represent possible stages in a multistage sam- 
pling procedure. 

The interested reader is referred to standard 
texts on sampling methods for further details. The 
methods discussed in previous sections of the pres- 
ent article will serve admirably, however, when- 
ever the detailed information necessary to deter - 
mine optimum procedures is lacking, as will fre- 
quently be the case. 


Summary of Notation 





Arithmetic mean of Sj for 
norms population. 

Arithmetic mean of Sj for 
sample. 

Average size of the schools in 
the sample. 

Coefficient of variation of 
school size innorms population. 
Sample coefficient of variation 
for size of school. 

Proportion of students tested in 
each school when proportionate 
subsampling is used. 

Value of mj whenthe same size 
of sample is drawn within each 
school. 

Mean size of school in norms 
population. 

Average size of within-school 
sample. 

**Size of school i’’—the number 
of students in the ith first-stage 
sampling unit. 

Size of sample within school i. 
Number of schools in norms 
population. 

Number of schools in norms 
sample. 

Correlation between school 
mean score and school size in 
norms population. 

Standard deviation of students’ 
scores in school i multiplied 
by Mj/(Mj-1). 

Standard deviation of students’ 
scores in subsample from 
school i multiplied by mj/ 
(mj-1). 

Standard deviation of school 
size in norms population. 
Standard deviation of all indi- 
vidual scores in norms popula- 
tion. 


A(S?)=>8? /N 
a(S?)=5s? /n 
a(Mj)=(2Mj)/n 
Cm=Sm/M 
Cy=8(Mj)/a(Mj) 


f=mj/Mj 
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Standard deviation of school 

mean score innorms population. 
Observed standard deviation of 

subsample means multiplied by 

n/(n-1). 

Mean score innorms population. 
Mean score of norms sample ob- 
tained by two-stage sampling 

with proportionate subsam- 
pling (5). 

Mean score of school i. 

Mean score in sample from 

school i. 

Mean score innorms sample ob- 
tained by simple cluster sam- 
pling (1). 

Unweighted mean of norms sam- 
ple when size of subsample is 

fixed. 

A weighted mean score of norms 

sample when size of subsample 

is fixed. 

Unbiased estimate of the mean 

of the norms population in sim- 
ple cluster sampling (3). 


Whenever actual norms data are available, the 
formulas in standard texts will usually be prefer- 
able, since these usually require fe wer assump- 
tions. 
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THE PURPOSE of this study was to investigate 
the relationship between self-concept and achieve- 
ment. For reasons of practicability, the vehicle 
for achievement in this study was reading im- 
provement on a college level. Reading improve- 
ment on this level as a subject for scientific in- 
vestigation has been replete with much speculation. 
Much of the literature inthis area tends to report 
research in terms of the cause of the reading dif- 
ficulty, and the factors involved in improvement 
(1,4,6,17). Some work has been done in attempt- 
ing to create theoretical frameworks with which 
to understand reading improvement (10,14). 
These theories, however, have not been tested 
successfully in their experimental setting. 

The theory tested in this study has its orienta- 
tion in the theory of self most succinctly integrat- 
ed by Rogers (12), and can be characterized by 
the statement by Lecky (7:153): 


Any value entering this system (of organ- 
ization of self valuation) which is inconsis- 
tent with the individual’s valuation of him- 
self cannot be assimilated: it meets with 
resistance and is likely, unless the general 
reorganization occurs, to be rejected. 


Theory and Propositions 





In this study the subject finds himself in a situ- 
ation which pressures him to change (a reading 
improvement program in which reading films of 
increasing speed are presented in each successive 
contact). The kind of change requiredis one from 
within. This condition produces a force upon the 
individual about which he is expected to do some- 
thing. How open he is to this experience will de- 
termine how he perceives this demand. If he 
perceives this demand as a threat, he defends 
against this threat and maintains his self con- 
cept. If he does not see this as a threat he 
can change his self concept commensurate with 
and including the new experience. The former 
approach would be expected to result in insig- 
nificantly less change than the latter. One 
can defend against this threat by distorting the ex- 
perience in such a way as to integrate very littie 
or he may deny the experience by leaving the sit- 





uation. The proposition which was tested in this 
study was that there will be significant differences 
among the self perceptions of the three groups in 
terms of their general defensiveness, Self as a 
Self, Self in Relation to Authority, Self as a Stu- 
dent, and Self as a Reader conception such that 
the three groups will appear in the following or- 
der from most defensive to least defensive: Attri - 
tion, Non-improver, and Improver. 


Method 


The Sample 


The subjects were drawnfrom three reading 
improvement classes at the University of Texas. 
These classes were not offered for credit and 
were purely voluntary on the part of the student. 
The only requirements were that the student read 
initially at the minimum rate of 250 words per 
minute and have a comprehension of 75 percent on 
the DRT. This approach tended to keep the group 
more homogeneous in terms of initial reading 
achievement. Other kinds of classes were provid- 
ed for those who did not meet these requirements. 
The groups which were studied were those who did 
pass the initial requirements. These classes met 
for a total of 14 one-hour sessions, during which 
increasingly speeded reading films were shown. 
These were followed by reading selections from a 
reading manual. After reading the film the stu- 
dents answered questions based on the contents of 
what they had read. This was done after they had 
read the selection in the manual. Both film and 
manual were arranged inseries so that the student 
was required to read faster each time. The stu- 
dent kept arecord of the speed of the film and the 
comprehension they achieved, and the speed and 
comprehension they were reading in the manual. 
A short time was allowed each hour for group dis- 
cussion of the students’ reading habits. 


The Groups 


Although the group consisted of 50 to 60 per 
class ranging widely in age, academic status, 
scholastic aptitude, only Freshman males and fe- 
males were used, because of the greater probabil- 
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ity of homogeneity. A sample of 54 was used, con- 
sisting of nine female and 45 male subjects. 

The subjects’ scores on the Diagnostic Read- 
ing Test before and after the program were con- 
verted into equivalent scores with use of the 
Graph of Equivalent Scores (11) which weight 
speed and comprehension when considered togeth- 
er. The weightings are such that comprehension 
is more affective than speed for the lower rates 
but the comprehension gets a heavier weighting 
for the higher weights. The scores on the pre- 
test were then converted into standard scores on 
posttest. A correlation between the pretest stand- 
ard and the posttest standard scores was .28 
which was not significant for the 54 subjects used. 
Inasmuch as the study was concerned with rela- 
tive improvement, Improvers were defined as 
those subjects whose standard scores from pre- 
test to posttest did incyease while Non-improvers 
were those students whose standard scorés from 
pretest to posttest did decrease. Thé Attrition 
group contained those subjects who started the 
program but who discontinued before the seventh 
session which was the halfway point. 


The Procedure 





1. The class met four times before the work 

on reading improvement began. 

A. During the first meeting the DRT was ad- 
ministered and an orientation tothe course 
was given. 

. During the second meeting the DRT test re- 
sults were interpreted to the group and the 
SCT administered. 

. During the third meeting the Self Sort was 
administered. 

. During the fourth meeting the Ideal Sort 
was administered. 

;. Arrangements were made for those students 
who did not take the measures described 
above at the prescribed time. 


2. The next 14 meetings were devoted exc! u- 
sively to reading improvement. 


3. When the 14 meetings had terminated four 
more meetings scheduled and proc edures 1A to 
1E were repeated. 


The Instruments 





The Q Sort— The Q methodology employed in 
the present research involved a sample population 
and the Q sample which incorporates operational 
counterparts of the theory advanced. Although the 
idea stems from Stephenson (15) the theory and 
procedures for working with a structured Q sam- 
ple followed by subjects of the sample population 
has been developed at the University of Texas by 
McQuire, Phillips, and Peck (8), and by Freed 





and McQuire (3). 

The Q sample consisted of 80 self reference 
statements selected from alarger universe of state- 
ments which canbe sorted ‘‘Self I am, ”’ and ‘‘Ideal 
self.’’ The ‘‘Self I am’’ is defined as the self as 
perceived by the subjects at the time of the sorting 
and the ‘‘Ideal self’’ as the self the subject would 
like tobe. Originally, the universe of Q state- 
ments stem from a number of sources (2, 9, 13, 16), 
and new items constructed by the experimenter and 
his colleagues. 

The experimenter chose 80 items provisionally 
sorted as to dimension and orientation by five psy- 
chologists. There were two orientations: inten- 
sional and extensional; and four dimensions: Self 
as a Self, Self in Relation to Authority, Self as a 
Student, and Self as a Reader. These statements 
were presented to judges who determined the place- 
ment of each statement. Classification ofthe state- 
ments were derived from the judges’ opinions after 
the placement of the items. Seventy-eight items 
were agreed upon as to placement at the . 02 level 
of confidence. Two items were rewritten. 

The sample was structured according to the bal- 
anced design shown in Table I. Thesubjects were 
asked to sort the 80 self reference statements 
with the following number of items in each stack 
assigned a value on the continuum from 1, ‘‘Least 
like me,’’ to 9, ‘‘Most like me. ”’ 

A quasi-normal distribution of values of Q items 
were designed such that x= 400, x*= 2382, x=5, c = 
2000. 

Sentence Completion Technique— This tech- 
nique was developed from the statements in the Q 
sort. These statements were selected from each 
of the four dimensions of thesort. They were con- 
verted into stems to be completed. 

Responses to the stem stimuli were analyzed 
on a five-point scale from intensional (1) to exten- 
sional (5). Total scores for each subject were de- 
rived from the values assigned to each of the 32 
items. SCT protocols of 10 subjects were chosen 
randomly from the entire population to test the re- 
liability of the scoring system. Directions for 
scoring the SCT responses including definitions for 
intensionality and extensionality were given to two 
judges who then scored the responses. Table II 
contains rank order correlations which resulted 
from comparisons of scoring by the experimenter 
and the two judges. The correlations among the 
experimenters scoring the 10 protocols and the two 
judges scoring the same 10 protocols were signifi- 
cant at the .05 level of confidence. The correla- 
tions between the two judges’ scoring approached 
significance. Table II contains the inte rcorrela- 
among the SCT total scores found by two judges 
and the experimenter. 

Academic Factors—The students were com- 
pared on a measure of academic aptitude, Ameri- 
can Council on Education Psychological E xamina- 
tion, and the following achievement measures: 











TABLE I 


DESIGN OF THE Q-SAMPLE 





Areas of Self Reference 





Orientation Authority Student Reader 





Extensional Items 10 10 10 


Intensional Items 10 10 10 





Total 20 20 20 





TABLE II 


INTERCORRELATIONS AMONG SCT TOTAL SCORES AS- 
SIGNED BY TWO JUDGES AND THE EXPERIMENTER 





Judge 1 Judge 2 





Experimenter . T10* . 64* 


Judge 1 er . 55 





*Significant at the .05 level of confidence 
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The Cooperative English Test, vocabulary scores 

from the Diagnostic Reading Test, grade-point 

averages before and after the program, and a 

measure of reading effectiveness (Equivalent 

Scores) derived from DRT rate and comp rehen- 
sion measure. Equivalent Scores before and af- 
ter the program were used for categorizing Im- 
provers and Non-improvers. 


Results 


Inasmuch as the data in a study such as this 
were quite cumbersome, the results will be present- 
ed in summary form with reference to the appro- 
priate tables. It may alsobe noted that there was 
much unforseen difficulty in collecting the data 
for study. As the test battery was very extensive, 
it was impossible to test subjeqts at one sitting. 
With the exception of theSCT no one measure was 
taken by all subjects. Random selections of equal 
numbers were necessary. The following is a 
summary of results: 


1. Self as a Self—The Attrition and Non-im- 
prover groups appeared to be more concerned 
with this area than the Improvers. The Attrition 
group identified least with the extensional items 
and the Non-improvers identified most with these 
items (Tables III and IV). 

The Improvers had the smallest discrepancies 
between the Self and Ideal sortings with both the 
intensional and extensional items. The Attrition 
group has thelargest discrepancies with both the 
intensional and extensional items whereas the Non- 
improvers showed discrepancies for only the in- 
tensional (Table V). 

In terms of defensiveness, the three groups ap- 
peared to be in the following order, from most de- 
fensive to least defensive: Attrition, Non-improv- 
er, and Improver. 

2. Self in Relation to Authority—In general, 
there appeared to be no real difference among the 
groups in this area. Some trends, however, may 
have meaning. Although there were similar con- 
cerns in the area, the Attrition group identi- 
fied with the intensional items more closely than 
the Improvers and Non-improvers. The Attrition 
group identified least with the extensional items 
while the Non-improvers showed greatest identi- 
fication with them (Tables III and IV). 

The groups were similar interms of total Self- 
Ideal discrepancies forthedimensions. The Attri - 
tion group showed 1 e ast discrepancy for the inten- 
sional items, while those of the Non-improver were 
most discrepant. The groups did not differ much 
with respect withdiscrepancies resulting from ex- 
tensional items (Table V). 

There was no clear pattern of defensiveness in 
this area. Despite its failure to differentiate among 
groups, thedimensions appeared to be of concern 
to all subjects. 











3. Self as a Student— The Improvers seemed to 
be most concerned with this area. They were the 
most intensional and the most extensional. The 
differences among groups within the extensional 
orientations were negligible. The Attrition was 
the next most concerned with the intensional items. 
The Non-improvers were the least concerned with 
the intensional items (Tables III and IV). 

The Self-Ideal discrepancies were the same 
within the area for the Attrition and the Improver 
groups. They both had large discrepancies with 
respect to the intensional items. Discrepancies 
resulting from the responses with respect to the 
extensional statements differed only slightly from 
group to group. The Non-improvers were least 
defensive in this area. Both the Attrition and the 
Improvers were more defensive than the Non-im- 
provers (Table V). 

In terms of defensiveness the three groups ap- 
peared to be in the following order from most de- 
fensive to least defensive: Improvers, Att ritions, 
and Non-improvers. 

4. Self as a Reader— The Improvers were most 
concerned with this area, the Non-improvers next 
and the Attrition group of least concern. The 
groups followed the same order in identification 
with the intensional items, while identification 
with the extensional items was greatest with Non- 
improvers and least with the Attrition group (Tables 
Il and IV). 

Although the total Self-Ideal discrepancies for 
the area was not significant for all three groups, 
the Non-improvers showed the greatest discrepan- 
cies with both the intensional and extensional items 
when orientations were separately considered (Ta- 
ble V). 

The Improvers were less defensive than the 
Non-improvers. The Improvers high intensional 
score was interpreted as an expression of its large 
concern withthe area. Essentially it placed almost 
all reader items as ‘‘most like me’’ indiscrimin- 
ately. The Non-improvers did respond differenti- 
ally to the intensionality and the extensionality of 
the items. This differential response is reflected 
in the Self-Ideal discrepancy which was largest 
for Non-improvers. The Attrition group’s lack of 
concern with the area could be interpreted as de- 
nial inasmuch as they tended to place all reader 
items, Self and Ideal in the ‘‘least like me’’ cate- 
gory. 

In terms of defensiveness, the three groups ap- 
peared to be in the following order from most de- 
fensive to least defensive: Attrition, Non-improv - 
er, and Improver. 

5. General Defensiveness—General overall de- 
fensiveness was measured by the three different 
means: discrepancy derived from Self and Ideal 
sorting, Self-Ideal correlations, and the SCT. 
From all three the results indicated distinct lev- 
els of defensiveness. The Attrition group showed 
more general defensiveness than either of the two 














TABLE Ill 


ANALYSIS OF VARIANCE IN Q-VALUES FOR 36 SUBJECTS ON SELF SORTS BE - 


FORE INITIATION OF PROGRAM 





Source of 
Variation 


Sum of Mean 
Squares Square 





Samples (S) 
Persons in 
Samples (SP) 


Sum 


Independent Variations 
nil 


nil 





Orientations (O) 
Interaction (OS) 
Deviation (OSP) 


Sum 


Orientations (O) 
718.00 718.00 

25.19 12.60 
319. 51 9. 68 


1062. 70 


.01 (O/OSP) 


-01 (OSP/R) 





Dimensions (D) 
Interaction (DS) 
Deviation (DSP) 


Sum 


Dimensions (D) 
229.14 76. 38 
258. 29 43.04 
540. 61 5. 46 


1028. 04 


"01 (DS/R) 
"05 (DSP/R) 





Interaction (OD) 
(ODS) 
Deviation (ODSP) 


Sum 


Dimensions and Orientations (OD) 
91.12 30. 37 
109. 28 18. 21 
483. 62 4.89 


684. 02 


.01 (ODS/ODSP) 





Relications (R) 


Total Variations 


10976. 90 


13752. 00 
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TABLE IV 


SUMMARY OF MEAN Q-VALUES FOR DIMENSIONS, ORIENTATIONS, AND 
SAMPLES FOR SELF SORTS BEFORE INITIATION OF PROGRAM 





Groups 


Self 


Authority Student 


Reader 





Improvers 
Non-improvers 
Attritions 


Total Means 


.21 
. 50 
. 50 


. 40 


Dimension and Sample 
5.10 
4. 81 
4.96 


4. 96 





Improvers 
Non-improvers 
Attritions 


Total Means 


Intensional Orientation, Dimension, Sample 


. 34 
~77 
.07 


. 73 


4.61 
4. 34 
4. 40 


4.45 





Improvers 
Non-improvers 
Attritions 


Total Means 


Extensional Orientation, Dimension, Sample 


.07 
. 23 
. 92 


.07 


5. 64 
5. 76 
5. 44 


5. 61 
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TABLE VI 


MEAN SELF-IDEAL CORRELATIONS FOR THREE GROUPS ON PRE-TEST 





Improvers Non-improvers Attritions 





. 56 - 55 . 29 





TABLE VII 


t- TESTS OF SIGNIFICANCE OF MEAN SELF-IDEAL COR- 
RELATIONS AMONG THE THREE GROUPS 





Groups 





Groups Non-improvers Attrition 





Improvers 0.07 2.11* 


Non-improvers see 2. 04* 





*Significant at the .05 level of confidence 


TABLE VII 


CHI SQUARE ANALYSIS OF SELF-IDEAL CORRELATIONS 
AMONG THE THREE GROUPS 





Groups 





Groups Non-improvers Attrition 





Improvers 0.00 3. 24* 


Non-improvers 3. 24* 





*Significant at the .10 level of confidence 





TABLE IX 


t- TESTS OF SIGNIFICANCE AMONG THE MEAN SCT SCORES 
OF THE THREE GROUPS 





Groups 





Groups Non-improvers Attrition 





Improvers 2.12 4. 16* 


Non-improvers re 2. 28** 





* Significant at the .01 level of confidence 
**Significant at the .05 level of confidence 


TABLE X 


CHI SQUARE ANALYSIS AMONG THE MEAN SCT SCORES 
OF THE THREE GROUPS 





Groups 





Groups Non-improvers Attrition 





Improvers 1.16 7. 28* 


Non-improvers ae 2.14** 





* Significant at the .01 level of confidence 
**Significant at the .05 level of confidence 


TABLE XI 


MEAN SELF-IDEAL CORRELATIONS BEFORE AND AFTER 
THE PROGRAM 





Before 





Improvers . 56 


Non-improvers . 55 
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TABLE Xi 


CHANGES IN SELF-IDEAL DISCREPANCIES IN MEAN Q-VALUES FROM PRE- TO POST-SORT- 
INGS FOR THE THREE GROUPS 





Dimensions 





Improvers Non-Improvers 


Self Authority Student Reader Self Authority Student Reader 








Intensional 
Items 0. 09* 0.32 0.85 -0.10** -0.35 -0, 27 0.18 -0, 24 


Extensional 
Items -0. 06 0.71 0.04 0. 26 -0.16 -0.04 -0.10 -0.32 


Total Change 0. 03 1.03 0. 89 0.16 -0.19 -0.31 0. 08 -0. 56 





* A plus (+) sign indicates that the differences in mean Q-values have increased from pre- to 


post-sortings representing an increase in Self-Ideal discrepancy. 
**A minus (-) sign indicates a decrease in Self-Ideal discrepancy. 


TABLE XIll 


MEAN CORRELATIONS OF PRE- AND POST-SORTS 








Improvers 


Non-improvers . 64* 





*Significant at the .01 level of confidence. 
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TABLE XVII 


ANALYSIS OF VARIANCES IN Q-VALUES FOR MALE-FEMALE ON SELF 
SORT BEFORE INITIATION OF PROGRAM 





Source of Sum of Mean 
Variation Squares Square 





Independent Variation 


Samples (S) nil 
Persons in Samples (SP) nil 


Sum 





Orientation (O) 


Orientation (O) 198. 02 198.02 " .01 (O/OSP) 
Interaction (OS) 1. 48 1.48 ee 
Deviation (OSP) 221.25 13.83 ‘ .01 (OSP/R) 


Sum 420. 75 





Dimension (D) 
Dimension (D) 44.97 14,99 3 .01 (D/DSP) 
Interaction (DS) 8.83 2.94 ee 
Deviation (DSP) 246. 30 5.13 ‘ ..» (DSP/R) 


Sum 300. 10 





Dimension and Orientation (OD) 
Interaction (OD) 3 38. 69 12. 90 .05 (OD/ODSP) 
(ODS) 3 14,95 4.98 eae 
Deviation (ODSP) 48 251. 36 5. 24 . ...- (ODSP/R) 


Sum 54 305. 00 





Replication (R) 5850.15 


Total Variation 6876 
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groups. The Non-improver though not exhibiting 
as much defensiveness in this respect as the Attri- 
tion sample, did show more than the Improvers 

(Tables V,;VI, VII, VIII, [X, X). 

6. Reliability of the Sort—The correlations of 
the Self sorts before and after the program (Table 
XIII), and of the Ideal sorts before and after the 
program (Table XIII), indicated a consistency well 
above that expected by chance. Self-Ideal corre- 
lations before and after the program (Table XI), 
however, suggested that this consistency did not 
apply to Improvers but it did to the Non-im pr ov- 
ers. The former indicated a considerable drop 
in correlations while the latter showed little 
change. Improvers further showed an increased 
discrepancy inthe areas in which they were most 
defensive: Self in Relation to Authority and Self as 
a Student. The Non-improvers although ind icat- 
ing little change did alter in the direction of a de- 
crease especially inthe areas where they had 
been most defensive: Self as a Self and Self as a 
Reader (Table XII). 

7. Validity of the Sort Sample—The analysis 
of the SCT yielded relationships among the three 
groups studied which were very similar to those 
found by the analyses of the Self and Ideal sorts 
(Tables IX and X). This cross validation supports | 
the probability that both instruments were meas- 
uring what they were designed to measure. 

8. Academic Differences—An analysis of the 
subjects’ performance on various aptitude and 
achievement measures indicated that the Non-im- 
provers scored much higher than the other two 
groups (Tables XIV and XV). 

The Improvers and the Attrition cases resem- 
bled each other in thisrespect. This relationship 
held true also with respect to entering reading ef- 
fectiveness as measured by the Equivalent Scores. 
At the end of the program this relationship be- 
came reversed as the Improvers improved while 
the the Non-improvers not only failed to improve 
but lost ground in terms of the measure of read- 
ing skill. 

The grade-point average indicated that although 
the Improvers had significantly lower scores than 
the Non-improvers the academic and achievement 
measures they had a high grade-point average. 
The Attrition group had the lowest achievement 
record. Atthe endofthe program, the Improvers 
dropped to a grade-point average similar to that 
of the Attrition sample, but the Non-improvers 
and the Attrition group increased slightly in terms 
of grade-point average. 

9. Sex Differences—An analysis was made of 
the academic factors (Table XVI), and the per- 
formance on the Self and Ideal sorts of the male 
and female (Table XVII). No significant sex dif- 
ferences were found. 

















Discussion 


The purpose of this study was to investigate the 
relationship between self concept and achievement 
as demonstrated by Freshmen who came voluntar- 
ily to a college reading improvement program. Al- 
though the differential effects of the various self- ~ 
concepts were thought by the experimenter and his 
colleagues to be operantin academic learning situ- 
ations of all kinds, the reading improvement pro- 
gram was chosen because of proximity and availa- 
bility. Inasmuch as a program such as this one 
‘“pressures’’ an individual to change his patterns 
of reading, he would be expected to do something 
about this pressure. He could be ‘‘open’’ to the 
situation and change his reading habits in relation 
to the demands of the situation, or on the other 
hand, he could avoid meeting the demand to change 
by either distorting the situation by his perceptions 
of it, or by denying it entirely. According to Ro- 
gerian self theory, he woulddeny or distort the ex- 
perience as a defense against an inconsistency 
with self concept. He would consider it more im- 
portant to maintain a conception of self than to in- 
tegrate experiences which might necessitate chang- 
ing the concept. This condition occurs when the 
self concept is used as a defense against threat. 
The theory, thus proposed, is that with all other 
factors equal, those who did something construc- 
tively from the experience, would demonstrate 
less defensiveness in their concept of self as a read- 
er fhan those who did not do as well. If an individ- 
ual’s concept of self as a reader were defensive, 
it wouldbe an expression of a more permeating 
defense system which would be manifested in self 
concepts other than that of reading. Three other 
areas were therefore studied. 

Defensiveness, inthis investigation refers to 
self-perceptions which are distortions in terms of 
concepts that are absolute, unconditional, unlimit- 
ed, or over generalized. Defensiveness is an at- 
tempt to maintain a concept of self, The anticipa- 
tion of an experience contrary to this tends to make 
the individual more adamant in his conception and 
hence more unrealistic. 


Findings 


The propositions tested by this study was that 
there would be significant differences among the 
self perceptions of the Improver, the Non-improv- 
er, and the Attrition groups. This basic proposi- 
tion has found support in terms of most of the five 
subsidiary propositions and other significant re- 
sults have been obtained which supported the prop - 
ositions. 

Proposition 1: There will be significant differ- 
ences in the amount of general defensiveness such 
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that the three groups will appear in the following 
orderfrom most defensive to least defensive: At- 
trition, Non-improver, and Improver. 

The data from three different measures tend- 
ed to support this proposition: the Self and Ideal 
sorts, the Self-Ideal correlations and the Sentence 
Completion Technique. The three groups did ap- 
pear in the order predicted. 

Proposition 2: There will be significant differ- 
ences in the amount of defensiveness in the self 
conception such that the three groups will appear 
in the following orderfrom most defensive to 
least defensive: Attrition, Non-improver and Im- 
prover. 

The data from the Self sorts and the Self-Ideal 
discrepancies clearly indicated support for this 
proposition. Inasmuch as this dimension was in- 
tended as a means for measuring the general con- 
cept, both propositions 1 and 2 were consistently 
supported by the data. 

Proposition 3: There will be significant differ- 
ences in the amount of defensiveness in the Self 
as a Reader conception such that the three groups 
will appear in the following order from most de- 
fensive to least defensive: Attrition, Non-improv- 
er, and Improver. 

The data from the Self sorts and the Self-Ideal 
discrepancy measure supported this proposition. 
This proposition was the major premise of the 
study and it was particularly meaningful that the 
defensiveness of this dimension did coincide with 
the relative performance of subjects in the pro- 
gram itself. In this proposition and the next, the 
theory was tested against observed behavior. 

Proposition 4: There will be significant differ- 
ences inthe amount of defensiveness in the Self as 
a Student conception such that the three groups 
will appear in the following order: from most de- 
fensive to least defensive—Attrition, Non-im- 
prover and Improver. 

This proposition was not supported by this data. 
In terms of defensiveness indicated by the Self 
sorts and the Self-Ideal discrepancies, the Non- 
improvers were the least defensive while the Attri- 
tion and Improver groups were both more defen- 
sive. This finding has significant meaning ina re- 
evaluation of the theory presented in the light of 
other findings which will be discussed later. 

Proposition 5: There will be significant differ- 
ences in the amount of defensiveness in the Self in 
Relation to Authority dimensions such that the 
three groups will appear in the following order 
from most defensive to least defensive: Attrition, 
Non-improver, and Improver. 

This proposition also found no support from the 
data. There were no apparent differences among 
the groups in this dimension. This was evidently 
an area of defensiveness for all three groups. Al- 
though the groups were not differentiated with re- 
spect to this concept, further study in this area 
may indicate differences between those students 





who volunteer for a reading improvement program 
and those who do not. 
Other findings were: 


1. Stability of the Self-Ideal correlations from 
before the program to after it was found for Non- 
improvers but not forImprovers. The latter group 
exhibited a decrease in Self-Ideal correlations. 
Upon further inspection this decrease was found to 
be attributable to an inc reased defensiveness in 
those areas whichthis group had first appeared to 
be defensive, namely, Self in Relation to Author- 
ity and Self as a Student. Although there was little 
change in the Non-improver group almost all the 
changes were in the direction of decrease in defen- 
siveness particularly in the areas of greatest de- 
fense which were Self as a Reader and Self asa 
Self. 

2. In almost all the aptitude and achievement 
measures used, the Non-improvers were signifi- 
cantly superior to the other two groups in terms 
of scores. Wherethere were no significant differ- 
ences there was a strong tendency toward signifi- 
cance. This was true as wellfor the entering 
Equivalent Scores which measured reading effec- 
tiveness. At the end of the program the Non-im- 
provers were lower than they had been when they 
had started. 

3. In terms of the grade-point average, the Im- 
provers had the highest for the semester previous 
to the one in which they had enrolled in the pro- 
gram. The grades declined during the semester 
in which the program was taken while the GPA of 
the other groups increased slightly. 


Re-evaluation of the Theory 





For the most part, the theory proposed by this 
study has held. Apparently there was a direct re- 
lationship between defensiveness in the self con- 
cept as a reader and relative performance in the 
reading improvement situation. There seemed al- 
so to be a relationship between defensiveness of 
the perception of self in this area, and the general 
defensiveness was expressed as relative defensive- 
ness among a group of students who had volun- 
teered for the course. How representative these 
students were of the college population was diffi- 
cult to say. The nature of volunteering for the 
class itself may have been aselective device. For 
one reasonor another those who did volunteer had 
an investment in their performance in the pro- 
gram above and beyond the reading itself. This 
phenomenon may have been related to the Self in 
Relation to Authority conception. In this dimen- 
sion the groups did not differ, but all indicated de- 
fensiveness. Although an explanation for this is 
not feasible on the basis of the data collected 
there was some indication that selectiveness was 
involved. The Attritiongroup, the most defensive 
throughout, resorted tothe defense of denial which 
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would insure against any change in achievement, 
self concept, or anything else. The Improver 
and Non-improver groups, however, presented 
some interesting phenomena. The Non-improver 
differed from the Improver inself concept in that 
the areas of defense were Self as a Reader and 
Self as a Self. The only changes they experienced 
were a loss in reading effectiveness and a tenden- 
cy to be a little less defensive throughout, espec- 
ially in their most defensive areas. Hogan (5) 
explains that defense decreases when the threat 
is removed from the individual. In effect then, 
the Non-improvers entered the program in which 
reading improvement was a threat to the concept 
of self as a reader. In a sense, in this group, 
the threat was defeated. They went to the pro- 
gram faithfully and were able to reify their con- 
ception to themselves. The situation for the mo- 
ment had ended. The defense could then be re- 
laxed. 

On the other hand, the Improvers’ area of de- 
fense was self as a student. For them it was im- 
portant to commit themselves to the reification of 
their defense in the concept of self in the student 
area. Unlike the Non-improvers, however, the 
end of the program did not eliminate the threat. 
As Hogan claimed (5: 420): 


Defense requires further defense since 
threat is not resolved by defense unless 
challenge is ceased or the threat resolved. 


The student area was not directly effected by 
the reading experiences except to produce further 
threat. That is with increased reading ef fective- 
ness studying should become more effective. This 
idea was inconsistent with the defensive self con- 
cept. Another defense or increasing defensive- 
ness in this area was required. The data support- 
ed this idea. Not only did the grades drop but 
their post sorts indicated an increased defensive- 
ness in the areas of greatest defense, namely, 
Self as a Student and Self in Relation to Authority. 

The data in this study clearly indicated that not 
only is self concept related to achievement, but 
that, in terms of their conception of self, individ- 
uals have a definite investment toperform as they 
do. With all things being equal, those who do not 
achieve choose not to do so, while those who do 
achieve, choose to do so. 


Summary 


This study attempted to test the proposition 
that there would be significant differences in the 
self perceptions of those who improved, did not 
improve, and dropped out in acollege reading im- 
provement program. The data tended to support 


the proposition. Other findings such as changes 





in self concept and grade-point average indicated 
further support for the theory that those who 
achieve as well as those whodonot, do so as a re- 
sult of the needs of their own self system. 
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THE CURRICULUM has been the subject of 
much study and experimentation during the past 
three decades. Many curricular patterns have 
been developed as educational workers have tried 
to translate the newer concepts of curriculum,in 
terms of the learning experiences of the child, 
into functional forms for school programs. A- 
mong these proposals has been the core curricu- 
lum. Briefly, the ‘‘core curriculum’’ has been 
defined as ‘‘that part of the total school curricu- 
lum which endeavors to assist pupils in meeting 
the needs most common to them and to society 
without regard to any subject-matter classifica- 
tion.’?! Since the Eight Year Study’ of the Pro- 
gressive Education Association, which affirmed 
that the secondary school could provide a sound, 
flexible education to meet pupils’ present and 
future life needs, there has been much education- 
al interest in the effects of curricular experi- 
mentation upon the high school students’ prepa- 
ration for and adjustment to the demands of col- 
lege life. 

The core curriculum at the Highland Park (Il- 
linois) High School was established as an elective 
program in 1943. The subject matter fields util- 
ized during the last twelve years have been: 
Grade IX, English and social science; Grade X, 
English-speech and biology; Grade XI, American 
history and American literature; and Grade XII, 
English (great books). The present investigation 
was a longitudinal study of the graduates of the 
core curriculum during their matriculation at 
colleges, using graduates ofthe conventional 
curriculum in the same high school asa control 


group. 
Purpose 
The purpose of the study was to compare the 


college experiences of selected graduates from 
the core curriculum and from the conventional 








*Footnotes will be found at the end of this article. 


curriculum at the Highland Park, Illinois, High 
School. The investigation sought comparative 
data in relation to the following general criteria: 
(1) college acceptance and matriculation, (2) aca- 
demic preparation for college, (3) scholastic 
achievements in college, and (4) extracurricular 
experiences at college. Specific aspects of college 
experiences included in the analyses of the two 
curricular samples were: acceptances for admis- 
sions, types and sizes of colleges attended, gen- 
eral academic preparation, first semester’s 
grades, estimated scholastic rank, academic 
honors attained, membership in fraternities or 
sororities, participation in Campus activities and 
organizations, and the achievement of special ex- 
tracurricular recognitions. 


Procedures 


The subjects utilized in the study were select- 
ed at random from the graduates of the core cur- 
riculum and from the graduates of the convention 
al curriculum curriculum at the high school daur- 
ing the period 1947-1952. The samples, which, 
included 239 graduates in each curricular group, 
were equated on the bases of sex, year of gradua- 
tion, and intelligence. Information, concerning 
grades achieved during the first semester’s at- 
tendance at college, was obtained from the grad- 
uates’ cumulative records at the high school. O- 
ther data were secured from questionnaires sent 
to the graduates in October, 1954. The question- 
naire survey effected a fifty-four percent re- 
sponse. In relevance to the representivity of the 
questionnaire sample, statistical calculations re- 
vealed no significant differences between the re- 
spondent sample and the surveyed population on 
five criteria common to both groups. Thedata 
from the questionnaires were transferred to I.B 
M. cards for tabulations, which were subsequent- 
ly analyzed statistically. 
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Summary of Findings 





Comparative analyses of thedata secured 
from the high school records of the graduates 
and the respondents in the questionnaire survey 
revealed no appreciable differences between the 
core and the non-core samples concerning the 
college experiences selected for the study. In 
general, the incidences of college acceptances 
for the two curricular groups had been quite sim- 
ilar, the matriculative patterns in regard to the 
types and the sizes of colleges attended had been 
somewhat congruous, the college preparation of 
the core and non-core groups had been very ac- 
cordant, and the scholastic and the social a- 
chievements of the two samples has beenquite 
comparable. The specific findings will be dis- 
cussed under the following captions: (1) College 
acceptance and matriculation, (2) Characteris- 
tics of colleges attended, (3) Preparationfor col- 
lege, (4) Scholastic achievements, and (5) Extra- 
curricular experiences. 


College Acceptance and Matriculation 





The questionnaire survey revealed that ninety- 
seven percent of the core respondents and ninety- 
six per cent of the non-core respondents had at- 
tended or were attending colleges or universities 
at the time of the study. Table I specifies aper- 
centage comparison of the number of acceptances 
for admission to colleges. Thegraduates in each 
curricular sample have been divided into the sub- 
groups of 1947-1949, 1950-1951, and 1952-1953. 
The respective percents of the total core and 
non-core samples as related to the quantity of 
college acceptances were: one, 11.0 and 10.3, 
two, 17 and 14.2; three, 21.0 and 29.2; four, 
13.0 and 9.1; five, 210 and 5.7; andsixor more, 
1.0 and .9. 


Characteristics of Colleges Attended 





General Types. --What general types of col - 
leges had the graduates selected for matric ula- 
tion? Table I includes apercentage distribution 
of the colleges attended by the core and the non- 
core respondents according to general types. The 
respective percents of matriculation by the core 
graduates and by the non-core graduates were: 
liberal arts, 31.0 and 34.9; men’s college, 8.0 
and 10.1; women’s college, 10.0 and 14.1; pri- 
vate church school, 1.0 and .9; state college or 
university, 40.0 and 32.2; normal school or 
teachers’ college, 4.0 and .9; and technological 
institution, 3.0 and 2.8. 





Sizes as determined by student popul ation-- 





How did the members of the two curricular 
samples compare in terms of enrollment at 
small or large colleges? Table III specifies a 





percentage comparison of the core and the non- 
core graduates according to the size- classifica- 
tions of colleges attended. The respective per- 
cents of enrollment for the core and the non-core 
graduates in colleges asclassified by student pop- 
ulation were: under 500, 7.0 and7.3; 500- 1,999, 
25.0 and 37.8; 2,000-4,999, 12.0 and 10.3; 5,000 
-7,999, 13.0 and 9.3; 8, 000-11, 999, 13.0 and14.1, 
12,000-15, 999, 12.0 and 9.3; and 16,000 and o- 
ver, 13.0 and 11.3. 


Preparation for College 





The information presented in this section will 
be focused on selected academic experiences of 
the graduates as revealed by questionnaire items. 
Answers to these questions weresought. How ad- 
equately had the graduates in eachcurricular 
sample been prepared for college work? What 
had been the areas of deficiency during participa- 
tion in college English courses? How well had 
they achieved in these English courses? 

General preparation for college-- Table IV 
presents a percentage comparison of the core and 
the non-core graduates concerning their opinions 
as to their general academic preparation for col- 
lege. Only four graduates in each curricular 
sample indicated that their prepa: ation was inad- 
equate. The respective percentsfor the core and 
the non-core graduates, according to their opin- 
ions as to degree of academic preparedness for 
college, were: Superior, 23.0 and 25.5; very ad- 
equate, 38.0 and 45.4; and adequate, 27.0 and 
21.7. Thus, 88.0 percent of the core graduates 
and 91.6 percent of the non-core graduates felt 
that they had been adequately or more than ade- 
quately prepared academically for college. In 
general, the findings indicated very little differ- 
ence on this factor for the two groups. 

Areas of deficiency in college English courses- 
--How well had the respondents been prepared for 
college in a basic course such as English? Since 
this subject had been common to the graduates of 
both curricular samples during their four years 
at the High School, the investigator assumed that 
this factor might have particular significance in 
the evaluation of the Core Curriculum. Table V 
specifies a percentage comparison of core and 
non-core graduates according to the areas of in- 
adequate preparation for college English courses. 
The percents of mention as indicated by the grad- 
uates from the core program andfrom thecon- 
ventional curriculum were, respectively: voc ab- 
ulary, 17.0 and 6.5; oral activities, 5.0 and 2.7; 
mechanics of grammar, 33.0 and 31.1; diction or 
word usage, 7.0 and .9; general reference book 
skills, 2.0 and 2.8; sentence structure, 14.0 and 
14.1; formal writing, 10.0 and 19.8; creative 
writing, 14.0 and 23.7; no deficiencies, 9.0 and 
12.3; and, other inadequacies, 5.0 and 6.5. 

Thus, the findings were somewhat similar for 
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TABLE I 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE 
GRADUATES ACCORDING TO NUMBER OF ACCEPTANCES 
FOR ADMISSIONS TO COLLEGES 








Core Graduates* Non-Core Graduates** 
Number of 1947- 1950- 1952- Total 1947- 1950- 1952- Total 
acceptances 1949 1951 1953 Sample 1949 1951 1953 Sample 











One 3.7 16.0 12.5 11.0 9.2 7.7 13.9 10.3 
Two 25.9 4.0 18.8 17.0 15.9 7.7 16.7 14.2 
Three 18.5 28.0 18.8 21.0 27.2 19.2 39.0 29.2 
Four 7.4 16.0 14.6 13.0 6.8 11.6 11.1 9.1 
Five one 4.0 2.1 2.0 4.6 7.7 5.6 5.7 
Six or more nee on 2.1 1.0 sina 3.8 acates 9 





* 35.0 percent of the sample did not respond. 
** 30.2 percent of the sample did not respond. 
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TABLE Il 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE 
GRADUATES ACCORDING TO GENERAL TYPES 
OF COLLEGES ATTENDED 








Type of Core Graduates* Non-Core Graduates** 
College 1947- 1950- 1952- Total 1947- 1950- 1952- Total 
Attended 1949 1951 1953 Sample 1949 1951 1953 Sample 











Liberalarts 48.1 28.0 22.9 31.0 45.4 34.6 22.2 34.9 


Men’s college --- 16.0 8.3 8.0 a2 8.6 6.5 Mi 


Women’s 
college 7.4 20.0 3 . . 3.8 R 14.1 


Private church 
school 4.0 , 3.8 


State college 
or university 33.3 


Normal school 


or teachers’ 
college 3.7 


Technological 
institution 3.7 


Other 





*7.0 of the sample did not respond. 
**3.7 of the sample did not respond. 
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TABLE Ill 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE 
GRADUATES ACCORDING TO SIZES OF COLLEGES 
OR UNIVERSITIES ATTENDED 








Student 
Population 
of Colleges 


s* 
1952- Total 
1953 Sample 


Cor r 
1950- 
1951 


1947- 
1949 


Non-Core Graduates** 


Total 
Sample 


1950- 1952- 
1951 1953 


1947- 
1949 





Under 500 
500-1, 999 

2, 000-4, 999 
5, 000-7, 999 
8, 000-11, 999 
12, 000-15, 999 


16,000 and over 


3.7 12.0 6.3 7.0 


40.0 16.7 25.0 


7.4 12.0 14.6 12.0 


14.8 24.0 6.3. 13.0 


22.2 8.0 10.4 13.0 


3.7 4.0 20.8 12.0 


11.1 4.0 18.8 13.0 


6.8 11.6 5.6 7.3 


50.0 30.7 27.8 37.8 


9.2 3.8 16.7 10.3 


9.2 5.6 9.3 
11.4 
6.8 


11.4 





* 7.0 percent of the sample did not respond. 
** 4.0 percent of the sample did not respond. 


TABLE IV 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE 
GRADUATES CONCERNING THEIR OPINIONS AS ‘TO 
GENERAL ACADEMIC PREPARATION FOR COLLEGE 





Preparation 


Core Graduates* 





for 


College 1949 


1947- 


Total 
Sample 


1950- 1952- 
1951 1953 


Non-Core Graduates** 
1950- 1952- Total 
1951 1953 Sample 





1947- 
1949 





Superior 25.9 


Very Adequate 44.4 
Adequate 
Inadequate 


Very Inadequate 


18.8 23.0 


35. 4 38.0 
31.3 27,0 
6.3 3.0 


2.1 1.0 


20.5 42.3 19.4 


54.5 34.6 41.7 


18.4 15.4 


4.6 





* 8.0 percent of the samp 


le did not respond. 


** 3.7 percent of the sample did not respond. 
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TABLE V 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE GRADUATES 
ACCORDING TO AREAS OF INADEQUATE PREPARATION 
FOR COLLEGE ENGLISH COURSES 





Area of Least Core Graduates* Non-Core Graduates** 
Adequate 1947- 1950- 1952- Total 1947- 1950- 1952- Total 
Preparation 1949 1951 1953 Sample 1949 1951 1953 Sample 











Vocabulary 4.8 24. 14.6 17.0 \ 7.7 5.6 6.5 
Oral Activities 7.4 4. 4.2 5.0 . 2.8 2.7 
Mechanics of Grammar ’ . 35. 33.0 31.1 
Diction or Word Usage . 10. 7.0 ° 9 


General Reference 
Book Skills r a 2.0 


2.8 
Sentence Structure 

Formal Writing 8. 

Creative Writing . 12. 


No Deficiencies : 8. 


Other Inadequacies 3.7 12. 





* 16.0 percent of the sample did not respond. 
** 7.6 percent of the sample did not respond. 
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the two samples except for perceptible deviations 
in four areas. Evidently, the core graduates had 
felt less inadequacies in creative writing and in 
formal writing, while the non-core graudates had 
been less deficient in vocabulary and in diction or 
word usage. The most discernible deficiencies 
had been in mechanics of grammar, which were 
specified as inadequacies by about one thirdin 
each sample. Other significant areas of concern 
for both groups had been sentence structure, vo- 
cabulary, creative writing, and formal writing. 


Grades achieved in English courses as college 
Freshmen-- What had been the relative achieve- 
ments of the core and the non-core respondents 
in English courses as college Freshmen? Ac- 
cording to Table VI, the respective percents for 
the core and the non-core graduates, in relevance 
to grades achieved in college English courses 
during their first year, were: A or equivalent, 
40.0 and 16.0; B or equivalent, 38.0 and 34.0; C 
or equivalent, 28.0 and 36.0; D or equivalent, 3.0 
and 4. 6; failure, 0.0 and .9; excused from the 
course because of significant scores on tests, 10.0 
and 3.8; and course not required, 1.0 and .9. 

In general, the achievement patterns for the 
two samples had been somewhat consistent. Of 
the 83.0 per cent of the core respondents who had 
taken English courses, only 3.0 per cent had a- 





chieved less than a grade of C or its equivalent. 


In the non-core sample, 5.5 percent of the grad- 
uates who had been enrolled in Freshmen English 
courses attained less than aC grade or the equi- 
valent. Every tenth core respondent had been ex- 
cused from the required English course because 
of a significant test score; this factor had been 
true for the non-core sample in about one case in 
every twenty-five. The percentage findings indi- 
cated that the non-core respondents had tended to 
receive more A and C grades than had thecore 
graduates, while the respondents from the core 
sample had achieved more grades in the B clas- 
sification. 


Scholastic Achievements 





How had the scholastic achievements for the 
core and the non-core graduates compared during 
their college matriculation? To seek an answer 
to this question, the following criteria were used: 
(1) first semester’s grades as reported tothe High 
School by the colleges, (2) estimated quartile 
ranks by the graduates, and (3) scholastic honors 
attained. The data for thefirst topic were secur- 
ed from the cumwaiive HighSchool records, 
while the last two items were included in the 
questionnaire. 

First semester’s grades as reported to the High 
School by the colleges--How had thescholastic 
achievements of the core and the non-core grad- 
uates compared during their first semester of col- 











lege work as revealed by grades? Inseeking these 
data, the investigator premised that the academic 
influences of the High School had carried, for the 
most part, through the first semester’s work at 
the college level; after that, possibly, the schol- 
astic forces in the college environment had tended 
to become stronger. Table VII, whichfollows, e- 
numerates a comparison of the mean gradesa- 
chieved by the core and the non-core graduates as 
college Freshmen during their first semester. 
Since English, science, and social sciencehad 
been common to both curricular samples during 
high school attendance, these subjects were se- 
lected for comparative purposes. The fourth entry 
specified the mean grades for all of the subjects 
in which the graduates had been enrolled. 

In general, the findings revealed that the ranges 
of the mean grades for the subgroups in the core 
curriculum sample were: English, 3.3-3.4; sci- 
ence, 3. 2-3.5; social science, 3.3-3.5; and all 
subjects, 3.2-3.5. Corresponding data forthe 
subgroups in the regular curriculum sample were: 
English, 3.1-3.5; science, 3.3-3.5; social sci- 
ence, 3.3-3.4; and all subjects, 3.4-3.9. The pat- 
terns of achievement in English, science, and soc- 
ial science were very consistent for the core and 
non-core graduates since the differences in mean 
grades ranged from zero to.2. The deviations in 
the mean grades for all subjects were more mark- 
ed inasmuch as the differences ranged from .1 to 
4. 
In the ‘‘all subjects’’ classification and in sci- 
ence, the small positive differences in the means 
had been in favor of the non-core graduates, while 
the core graduates had held a slight advantage in 
English and in social science. In general, the dif- 
ferences in the achievements for the two groups 
had been very slight; collectively, their scholastic 
attainments in English, science, social science, 
and ‘‘all subjects’’ represented slightly above ‘‘av- 
erage’’ achievement or a grade of C as originally 
symbolized. 

Estimated scholastic quartile ranks--How had 
the scholastic status of thetwocurricular samples, 
in reference to quartile ranks, compared in col- 
lege? An item in the questionnaire asked: ‘‘To 
the best of your knowledge (on the basis of schol- 
astic grades), in what quarter of your (class, 
graduating class) (do, did) you rank?’’ Table VIII 
presents a percentage comparison of the findings 
for the two curricular samples. The respective 
percents of mention by the core and the non-core 
respondents concerning theirestimated class 
quartile ranks were: lowest quarter, 1.0 and 3.8; 
and highest quarter, 37.0 and 34.0. 

In the core sample, 72.0 percent of the respon- 
dents estimated that they had ranked inthe upper 
half of their classes scholastically, while 71.9 
percent of the non-core respondents had designated 
a similar status. One person in the core group 
had been in the lowest quarter; four respondents 
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TABLE VI 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE GRADUATES 
ACCORDING TO GRADES ACHIEVED IN FRESHMAN 
ENGLISH COURSES IN COLLEGE 





Grades in Core Graduates* Non-Core Graduates** 
Freshman 1947- 1950- 1952- Total 1947- 1950- 1952- Total 
English 1949 1951 1953 Sample 1949 1951 1953 Sample 











A or equivalent 22.2 16.0 8.3 14.0 15.9 23.1 16.0 
B or equivalent 37.0 32.0 41.7 38.0 36.8 30.7 " 34.0 
C or equivalent 22.2 32.0 b 28.0 34.0 30.7 : 36.0 
D or equivalent 3.7 4.0 2.1 3.0 2.2 ooo 4.6 
Failed the course ooo cco ooo ooo 2.2 9 
Excused from the course 11. 1 8.0 10.4 


Course not required ooo ooo 2.1 1.0 





* 8.0 percent of the sample did not respond. 
** 4.7 percent of the sample did not respond. 
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TABLE VII 


A COMPARISON OF MEAN GRADES ACHIEVED BY CORE AND 
NON-CORE GRADUATES AS COLLEGE FRESHMEN 
DURING THE FIRST SEMESTER 





s (b 
1952- 
1953 


Non-Core Gr 
1947- 1950- 
1949 1951 


Core Graduates (b) 


1947- 1950- 1952- 
1949 1951 1953 


I. Mean Grades (a) 





English 3.4 3.3 3.3 3.5 3.1 3.2 


Science 3.5 3.2 3.4 3.5 3.3 3.5 


Social Science 3.5 3.4 3.3 3.3 3.3 3.4 


All Subjects 3.5 3.3 3.2 3.9 3.4 3.5 





Core and Non-Core Samples 
1947- 1950- 1952- 


1949 1951 1953 


Il. Differences in Means (c) 





English - 1 (NC) .2 (C) -1 (C) 


Science -- .1 (NC) .1 (NC) 


Social Science .2 (C) .1 (C) .1 (NC) 


.3 (NC) 


. 4 (NC) 


All Subjects . 1 (NC) 





(a) Grades ranged from a possible low of 1.0 to a possible high of 5. 0. 

(b) Sample: 1947-1949, 101; 1950-1951, 66; 1952-1953, 72. 

(c) Positive differences in means in favor of the core or the non-core 
graduates are indicated by (C) and (NC), respectively. 


TABLE VIII 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE GRADUATES 
ACCORDING TO SCHOLASTIC QUARTILE RANK IN COLLEGE 





Non-Core Graduates** 


Core Graduates* 
1950- 1952- Total 


1950- 1952- 


Estimate of 








Total 1947- 


Scholastic 
Rank 


1947- 


1949 1951 


1953 


Sample 


1949 


1951 


1953 Sample 





Lowest Quarter 
Second Quarter 
Third Quarter 


Highest Quarter 


16.0 
44.0 


28.0 


43.8 


1.0 


4.6 


11.4 


7.7 


7.7 


3.8 
33.3 





* 9.0 percent of the sample did not respond. 
** 7.6 percent of the sample did not respond. 
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TABLE Ix 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE GRADUATES 
WHO WERE HONOR STUDENTS SCHOLASTICALLY IN COLLEGE 





Core Graduates* Non-Core Graduates** 
1947- 1950- 1952- 1947- 1950- 1952- Total 
Honor Student 1949 1951 1953 1949 1951 1953 Sample 











Yes 25.9 28.0 31.3 31.3 34.6 27.8 31.1 
No 55.5 56.0 47.9 ° 42.3 52.8 50.0 


No official recognition 3.7 4.0 5.0 ° " 11.6 8.3 8.1 
by the college 





* 12.0 percent of the sample did not respond. 
** 9.4 percent of the sample did not respond. 


TABLE X 


A PERCENTAGE COMPARISON OF CORE AND NON-CORE GRADUATES 
ACCORDING TO MEMBERSHIP AND NON-MEMBERSHIP IN 
SOCIAL FRATERNITIES OR SORORITIES 





Core Graduates Non-Core Graduates 
1947- 1950- 1952- Total 1947- 1950- 1952- Total 
Status of Membership 1949 1951 1953 Sample 1949 1951 1953 Sample 








Membership 40.7 56.0 47.9 48.0 56.8 65.3 58.3 59.2 
Non-Membership 18.5 32.0 35.4 30.0 18.4 3.8 16.7 14.2 


Unavailable at the col- 
lege or university 29.6 4.0 . 14.0 22.8 + 25.0 23.6 
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from the non-core sample had specified the same ° 


standing. For the most part, the data indicated 
no appreciable differences between the core and 
the non-core samples in relevance to the respon- 
dents’ estimated scholastic class ranks during 
college attendance. 

Scholastic honors--In high school, 27.6 per- 
cent of the core graduates and 28.4 percentof the 
non-core graduates from a total sample of 239 in 
each curricular group had been selected for mem- 
bership in the National Honor Society, the award 
had been made on a basis of scholarship, charac- 
ter, leadership, and service. Thus, there had 
been no perceptible differences for the two sam- 
ples in relevance to the highest recognition ac- 
corded to high school seniors. Whathad been the 
picture at the college level? An item in the col- 
lege section of the questionnaire asked: ‘‘(Are, 
Were) you an honor student scholastically? (e. g. 
honor roll, dean’s list, graduate with honors, 
etc.)’’ 

The findings, submitted in Table IX, revealed 
that 29.0 percent of the core graduates and 31.0 
percent of the non-core graduates who responded 
to the questionnaire survey had been officially re- 
cognized as honor students during their matricu- 
lation at colleges. These data indicated that, in 
general, the percentages of incidencesfor the 
two samples in the attainment of scholastic recog- 
nitions during college were similar; the findings 
were very consistent to those revealed for the 
two groups in regard to membershipin the Na- 
tional Honor Society at the high school level. 

Extracurricular experiences in college-- What 
had been the extracurricular exper iences of the 
core and the non-core graduates in college as re- 
vealed by questionnaire items? This section will 
be devoted to a discussion of certain facets from 
the college extracurricular lives of the respon- 
dents, captioned: (1) Membership and non-mem- 
bership in social fraternities or sororities, (2) 
Participation in campus activities and organiza- 
tions and (3) Special recognitions. 

Membership and non-membership in social fra- 
ternities or sororities--Table X shows a percen- 
tage comparison of the core and the non-core 
graduates according to membership and non-mem- 
bership in social fraternities or sororities during 
college. 

Approximately one half of the core sample had 
belonged to fraternities or sororities during col- 
lege, whereas about three fifths of the non-core 
respondents had been members. More than twice 
the percentage of core graduates than non-core 
respondents revealed that they had not belonged 
to a fraternity or to a sorority. In regard to the 
factor of availability, 14.0 percentofthecore and 
23. 6 percent of the non-core sample members 
specified that fraternities or sororities had been 
unavailable at the colleges of their attendance. So, 
in effect, the availability had been more limited 

















for the non-core respondents, yet they had shown 
a substantially larger membership in social fra- 
ternities and sororities than the core graduates. 
Participation in campus activities and organi- 
zations--What had been the social activity pat- 





terns of the two curricular groups during college? 


Table XI shows a percentage comparison of the 
core and the non-core respondents according to 
their participation in campus activities and in or- 
ganizations during college. 

For the most part, the patterns of participation 
in activities classified under ten arbitrary cate- 
gories were quite similar for the two samples ex- 
cept for discernible variations in dramatics, re- 
ligious activities, sports, and student govern- 
ment. The non-core respondents reported more 
appreciable incidences of participation in sports, 
dramatics, and student government, whereas the 
core graduates tended to have been associated 
more readily with religious activities or organi- 
zations. 

Special recognitions-- How did the social dis- 
tinctions attained by the two curricular groups 
during college compare? Table XII specifies that 
the respective percents for the core and the non- 
core samples as related to theattainments of spe - 
cial recognitions at college were: class officers, 
10.0 and 9.3; officers of non-honorary organiza- 
tions, 27.0 and 32.1; members of honorary or- 
ganizations, 17.0 and 11.3; special awards, 17.0 
and 15.8; miscellaneous, 0.0 and 2.8; and none, 
34.0 and 42.5. 

Thus, about one third of the core graduates and 
approximately two fifths of the non-core respon- 
dents had received no special recognitions. In re- 
gard to class officers and special awards, the two 
samples had been about equally represented. The 
core respondents had tended to receive more re- 
cognitions in honorary organizations, while the 
non-core graduates had been selected more fre- 
quently as officers in non-honorary organizations. 
In general, the incidences of special recognitions 
within the two samples were somewhat consistent. 





Conclusions of the Study 





Some of the findings of the study are of a fac- 
tual nature; other data represent the opinions of 
graduates. Three general conclusions, which 
emanated logically from the data of the study, are 
listed below: 

1. The core graduates had been as well pre- 
pared for college matriculation as had the non- 
core graduates. --The core graduates had been 
accepted by colleges as readily as the graduates 
of the conventional curriculum, and they had at- 
tended colleges as frequently as the non-core 
members. Although the attendance patterns of the 
two samples differed somewhat in relation to the 
general types and the sizes of colleges, apparent- 
ly the desires or the preferences of the core grad- 
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uates had not been restricted. Approximately 
nine tenths of the graduates in each sample felt 
that their general college preparation had been 
adequate or more than adequate. About one third 
of each curriculum sample specified inadequacies 
in mechanics of grammar inrelationto Freshman 
English courses; the general patterns of deficien- 
cies in English were similar for the core and the 
non-core groups. 

2. The core graduates had achieved academ- 
ically in college as well as had the non-core grad- 
uates. --This conclusion has been reached in the 
light of significant evidence secured from the 
questionnaire and the high school records. The 
mean grades achieved by the core and thenon- 
core graduates during their first semester as col- 
lege freshmen were very much alike. The find- 
ings of the questionnaire included no discernible 
differences in the academic achievements of the 
two samples on the criteria of estimated class 
ranks and scholastic recognitions. 

3. The social achievements of the core and the 
non-core graduates during college attendance had 
been somewhat similar.-- Approximately half of the 





core sample had belonged to fraternities or sor- 
orities during college, while almost three fifths 
of the non-core graduates had been members of 
these social organizations. About two-thirds of 
the core and three fifths of the non-core graduates 
had received special recognitions of a social na- 
ture during their college years. Thecore sample 
members had tended to receive more distinctions 
in honorary organizations, whereas, the non-core 
graduates had been selected more frequently as 
officers in non-honorary groups. 


FOOTNOTES 


* Adapted from a dissertation presented in par- 
tial fulfullment for the degree of Ed.D. at 
Northwestern University. 

1. Dictionary of Education, p. 114. Prepared un- 
der the auspices of Phi Delta Kappa, Carter 
V. Good, editor. New York: McGraw-Hill 
Book Co., Inc., 1945. 

2. Wilford Aikin, The Story of the Eight-Year 
Study. New York: Harper and Brothers, 1942. 
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A STUDY OF THE VIEWPOINTS HELD BY 
SCHOOL AMDINISTRATORS REGARD- 
ING VOCATIONAL EDUCATION IN 
THE SECONDARY SCHOOL 


FRANK J. WOERDEHOFF and RALPH R. BENTLEY 
Purdue University 


SEC TION I 
INTRODUC TION 


THE EDUCATIONAL viewpoints held by school 
administrators are presumed to be important fac- 
tors in determining curriculum offerings in the 
secondary school. Of course, the educational 
practices in the school system may or may not 
conform to the educational viewpoints held by the 
administrator. It must be recognized that the 
school administrator cannot always translate his 
educational viewpoints into practice because of a 
host of conditions. Nevertheless, in terms of 
probable inference, the school administrators are 
in a favorable position to exert influence on the 
curriculum design of the secondary school. Con- 
sequently, it is reasonable to assume that their 
viewpoints regarding vocational education contrib- 
ute much toward the degree of acceptance or re- 
jection of this phase of secondary education and 
the way in which the program is carried out. 
Therefore, this study was undertaken to secure 
the viewpoints of Indiana school administrators 
regarding pertinent questions dealing with voca- 
tional education. 


Description of Study 





This study was cooperatively planned and car- 
ried out as a team research project by three mem- 
bers of the Purdue University vocational teacher 


education faculty representing vocational agri- 
cultural education, vocational home economics 
education, and trade and industrial education. 
The study received the endorsement and support 
of the presidents of the following organizations: 


1. Indiana Association of Town and City Super- 
intendents, 
2. Indiana County Superintendents’ Associa- 
tion, and 
. Indiana Association of Secondary School 
Principals. 


Purpose of This Study 





The purpose of this study was to discover the 
viewpoints of Indiana school administrators re- 
garding (1) vocational educationin general, (2) vo- 
cational agriculture, (3) vocational home econom- 
ics, and (4) vocational trade and industrial educa- 
tion, and to determine whether there were signifi- 
cant differences among school administrators cat- 
egorized according to (1) type of administrative 
position, and (2) experience with vocational edu- 
cation programs. 


Definition of Terms 





Certain terms used in this study are defined as 

follows: 

Viewpoint. The term ‘‘viewpoint’’ may be de- 
fined as an affectively toned idea or group 
of ideas predisposing a person to action 
with reference to a specific object. 

School Administrator. The term ‘‘school ad- 
ministrator’’ in this study will refer to 
(1) county superintendents, (2) superin- 
tendents of independent school districts 
(city and town), (3) city secondary school 
principals, and (4) county secondary school 
principals (principals who administer 
schools under the jurisdiction of county 
superintendents). 

Vocational Education. The term ‘‘voc ational 
education’’ as used in this study refers to 
three of the four vocational education pro- 
grams in the secondary schools which are 
approved for federal reimbursement by 
the Indiana State Department of Public In- 
struction. These vocational education pro- 
grams include home economics education, 
agricultural education, and trade and indus- 
trial education. 








Research Procedure and Sample 








The data upon which this study was based were 
collected by means of a questionnaire designed 
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TABLE I 


NUMBER OF SECONDARY SCHOOL PRINCIPALS BY 
SIZE OF SCHOOL WHERE EMPLOYED 





Number of 
Size of School Principals 





Less than 100 169 


100 - 249 
250 - 499 
Over 500 


Total 





TABLE II 


NUMBER OF SECONDARY SCHOOL PRINCIPALS HAVING AND NOT 
HAVING EXPERIENCE WITH VOCATIONAL EDUCATION 





Experience 


Number Number Not 
Vocational Area Having Having 








Vocational Agriculture 392 122 
Vocational Home Economics 433 81 


Vocational Trade and 
Industrial Education 132 





TABLE II 


NUMBER OF ADMINISTRATORS BY TYPE OF ADMINISTRA- 
TIVE POSITION 





Administrative Position 





County Superintendents 
City Superintendents 
County Principals 

City Principals 


Total 
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(1) to obtain personal data concerning ty pe of ad- 
ministrative position, and experience with voca- 
tional educational programs, and (2) toobtain the 
school administrators’ viewpoints regarding voca- 
tional education in general, vocational agriculture, 
vocational home economics, and vocational trade 
and industrial education. 

The items in the questionnaire were prepared 
by the investigators and were reviewed by Purdue 
University Division of Education faculty members 
concerned with school administration, general 
secondary education, and vocational education. 
The items were revised in light of thecriticisms 
and suggestions of staff members. a 

The questionnaire included 112 items which 
were distributed as follows: (1) 20 items on gen- 
eral vocational education, (2) 30 items on voca- 
tional agriculture, (3) 30 items on vocational home 
economics, and (4) 32 items on vocational trade 
and industrial education. An opportunity was pro- 
vided for the respondents to indicate whether they 
strongly agreed, agreed, were undecided, dis- 
agreed, or strongly disagreed with each item. 

The questionnaires were sent to all of the 
county and city superintendents and secondary 
school principals in Indiana. Of the 1027 ques- 
tionnaires sent, 712 or nearly 70 percent were 
returned. The distribution anddescription of the 
administrators are shown in Tables I, II, II. 

The data were tabulated for respondents cate- 
gorized according to (1) type of administrative po- 
sition, and (2) experience with specific vocational 
education programs. The responses, ‘‘ strongly 
agree’’ and ‘‘agree’’, were combined as were the 
responses ‘‘strongly disagree’’ and ‘‘disagree.’’ 
Percentages of responses were computed for the 
administrators in each of the above categories 
and for the total group. 

The chi-square technique was used to ascer- 
tain whether or not there were significant differ- 
ences in the responses to each item wherever the 
numbers were sufficiently large. Comparisons 
were made between: 


1. County superintendents and city superintend- 
ents, 
2. County principals and city principals, 
3. County superintendents and county princi- 
pals, 
. City superintendents and city principals, 
. Secondary school principals who have and 
have not had administrative experience with 
vocational education programs. 


SEC TION II 
ANALYSIS OF DATA 
The results of this study are shown in Tables 


IV, V,VI,and VII. These tables show the percent- 
ages of administrators who agreed, disagreed, 


were undecided, or did not respond to the items 
in the questionnaire dealing with various aspects 
of vocational education. Also, these tables show 
the significant differences between the responses 
of administrators when grouped according to type 
of position. In Tables V, VI, and VII, _ the signifi- 
cant differences between administrators who have 
and have not had experience with vocational educa- 
tion are shown. 


Vocational Education 








The findings shown in Table IV reveal that 80 per- 
cent of the administrators believed that the suc- 
cess of a local program of vocational education de- 
pends largely upon the degree to which they en- 
courage and support the program. Among these 
administrators 27 percent believed that vocational 
education caused too many administrative prob- 
lems. Nevertheless, 91 percent believed that vo- 
cational education should be providedin the high 
*«hool and 84 percent agreed that skills for earn- 
ing a living are as important as skills for social 
living. Furthermore, 94 percent believed that vo- 
cational education courses deserve credit equal to 
academic courses in the curriculum. Only five 
percent expressed the view that bright pupils should 
be discouraged from taking vocational courses. Sev- 
enty-one percent of the administrators indicated 
that the per pupil cost for vocational education is 
justifiable and 82 percent believed that the enroll- 
ment per class should exceed 25 pupils. The views 
administrators expressed varied widely with re- 
gard to all pupils being interested in vocational 
subjects, the socio-economic level of vocational 
education students, the necessity of vocational ed- 
ucation for all pupils and whether vocational edu- 
cation should be general or specific. 

Approximately 50 percent agreed that federal 
funds are desirable to finance vocational education. 
The majority of the administrators were uncertain 
or were in disagreement concerning the extent to 
which the State Department of Public Instruction 
exercises control over federally reimbursed voca- 
tional education programs. Fifty-six percent be- 
lieved that state and federal funds should be avail- 
able to match equally local school funds for the 
travel costs of teacher supervision of pupil pro- 
jects. Two-thirds of the administrators were op- 
posed to state and federal agencies setting time al- 
lottments for classes in vocational education. How- 
ever, 82 percent agreed to having the State Depart- 
ment of Public Instruction set standards and ap- 
prove local vocational education facilities. 


Vocational Education in Agriculture 





Table V shows that the vast majority of admin- 
istrators agreed that vocational agriculture should 
be an elective course, and, consequently, did not 
agree that freshmen boys in rural areas nor all 
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farm boys should be required to take courses in 
vocational agriculture. Eighty-six percent 
agreed that farm shop instruction should be given 
when a school maintains a department of vocation- 
al agriculture. Nearlythree-fourths favored a 
state course of study for vocational agriculture. 
There was divided opinion among administrators 
regarding the practice of enrolling only those stu- 
dents in vocational agriculture who have fac ilities 
available for supervisedfarm practice. Ninety- 
two percent of the administrators believed that the 
teacher of vocational agriculture should visit his 
students on their home farms inorder to supervise 
farm practice work, and 89 percent believed that at 
least three such visits be made by the teacher each 
year. However, only 66 percent agreed to recog- 
nize time needed for making farm visits as a part of 
the agriculture teacher’s workload. Then, too, 92 
percent regarded field trips as an essential part 
of good instruction in vocational agriculture, and 
78 percent agreed that school owned and operated 
buses should be used for making fieldtrips. Sixty- 
nine percent of the administrators agreed that a 
school that maintains a department of vocational 
agriculture should have a Future Farmers of Amer- 
ica chapter. More than three-fourths of the ad- 
ministrators believedthatthe Future Farmers of 
America organization aids students in developing 
desirable social, civic, and vocational interests 
and abilities. There was considerable difference 
of opinion among administrators regarding the 
justification of pupils and teachers being absent 
from school to participate in F. F.A. and other 
vocational agriculture activities. 

Three-fourths of the administrators indicated 
that the cost for facilities and equipment for voca- 
tional agriculture could be justified and that de- 
partments of vocational agriculture in Indiana high 
schools should be maintained where needed even 
though they are not reimbursed by federal funds. 

There was no common agreement among Indi- 
ana school administrators regarding the schools’ 
responsibility for organizing and conducting class- 
es for young and adult farmers and regarding hav- 
ing agricultural agencies other than the public 
school assume this responsibility. 

The findings indicate that these school admin- 
istrators believed that vocational agriculture 
teachers are as well qualified and as competent 
as are other teachers in the secondary school. 
Eighty-five percent did not believe that the daily 
work load of vocational agriculture teachers is 
greater than that of other teachers, and 71 per- 
cent agreed that agriculture teachers should be 
employed for twelve months. Only 60 percent 
agreed that vocational agriculture teachers should 
be responsible for 4-H clubs in the community. 

Three-fourths of the administrators believed 
that in each school which maintains adepartment 
of vocational agriculture an advisory committee 
should be appointed to work with theteacher of ag- 
riculture and the administrator. 





Vocational Education in Home Economics 





The viewpoints held by Indiana school adminis- 
trators regarding Vocational Home Economics are 
shown in Table VI. In one instance, nearly three- 
fourths of the administrators indicated that they be- 
lieved that vocational homemaking courses should 
be elective, and in another instance, 67 percent 
indicated that homemaking courses should be re- 
quired of all girls. Fifty-one percent believed that 
homemaking education is as important for boys as 
for girls. However, 90 percent indicated that boys 
as well as girls should have family living courses 
in order toprepare them for their responsibilities 
as homemakers. Approximately 95 percent of 
the administrators agreed that educationfor home- 
making is as important for girls of superior abil- 
ity as those of lesser ability regardless of their 
socio-economic background. 

A large majority of the school adm inistrators 
believed that directed home and community learn- 
ing experiences should be required of each voca- 
tional homemaking pupil. 


Eighty-three percent agreed that a good home- 
making program will include provisions for home 
visitation to help students with their home projects, 
and 77 percent believed that vocational homemak- 
ing teachers should visit their students in their 
homes at least twice a year. 

It was found that nearly 80 percent agreed that 
travel expenses should be provided to encourage 
teachers to visit the homes of their students. Only 
66 percent of the administrators would consider 
the time required for making such visits as a rec- 
ognized part of the homemaking teacher’s load. 

It may be observed from Table VI that approxi- 
mately four-fifths of the administrators agreed 
that the Future Homemakers organization isa 
worthwhile organization which provides situations 
that help students to further develop their leader- 
ship abilities and homemaking skills and abilities. 
Seventy-one percent believed that each department 
of homemaking should organize a F.H. A. chapter. 
In contrast the administrators were less inclined 
to regard the vocational homemaking teacher as 
being responsible for the girls’ 4-H club work in 
the community. 


Sixty-seven percent of the administrators 
agreed that it was desirable to have an advisory 
committee to work with the homemaking teacher 
in each school that maintains a vocational home- 
making de: irtment. The data indicated that about 
one out of every five Indiana school adm inistra- 
tors were undecided about the desirability of this 
practice. 

There appeared to be much uncertainty among 
administrators regarding the public schools’ re- 
sponsibility for adult homemaking education. Only 
30 percent of the administrators agreed that class- 
es for adult homemakers should be conducted in 
school that maintain departments of vocational 
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homemaking. 

The data in this study indicated that school ad- 
ministrators regard vocational homemaking teach- 
ers as being as well prepared for their jobs, as 
cooperative, and performed as capably in teach- 
ing situations as other teachers. The teaching 
load of vocational homemaking teachers was re- 
garded by 70 percent of the administrators as at 
least equal to that of other teachers in the high 
school. Two-thirds believed that teachers of 
homemaking should be employed for a period of 
time extending beyond the school year. 


Trade and Industrial Education 





In Table VII the viewpoints of Indiana School ad- 
ministrators regarding trade and industrial educa- 
tion are shown. Approximately 70 percent of 
these administrators believed that trade and in- 
dustrial education should be included as a part of 
the high school program and only 13 percent be- 
lieved that this vocational educational program 
was too costly. Over 75 percent believed that the 
objective of trade and industrial education should 
be to prepare young people for useful employment 
and that the instructional equipment and tools 
should be comparable tothe type used in industry. 
One-half of these administrators agreed that high 
school girls should be provided an opportunity to 
enroll in industrial education training programs. 

There seemed to be much uncertainty among 
administrators as to whether the public school or 
industry should train adult workers. This is evi- 
denced by their responses to the adult education 
items shown in Table VII. The administrators ex- 
pressed viewpoints which indicated that they were 
uncertain or did not believe that the public school 
had a responsibility for training out-of-school 
youth, semi-skilled workers, apprentices, and 
supervisory and foremen personnel. 

Although there was considerable uncertainty 
among administrators’ opinions regarding the pub- 
lic schools’ responsibility for providing vocation- 
al trade and industrial education through coopera- 
tive work-education programs, the majority opin- 
ion would support such a program. The majority 
opinion likewise favored having a specially trained 
teacher-coordinator for the program, agreed that 
special curriculum materials should be provided 
for related in-school instruction and believed that 
students should receive high-school credit for co- 
operative work-education. The viewpoints ex- 
pressed by administrators varied widely with re- 
gard to students who are enrolled in cooperative 
work-education programs receiving both wages 
and high-school credit. 

The viewpoints of school administrators indi- 
cated that they believed that a qualified director 
or coordinator should be employed to administer 
the trade and industrial program of the school. 
The administrators favored the establishment of 


advisory committees for trade and industrial edu- 
cation. Their opinions, however, suggested that 
they were uncertain about the composition of such 

committees. While 85 percent agreed that employ- 
er groups should be consulted for advice, only 56 

percent believed that union and em pl oyee groups 

should be consulted. 

The viewpoints expressed by school administra- 
tors regarding the qualifications and competencies 
of trade and industrial teachers were similar to 
those expressed regarding other vocational teach- 
ers. 


Significant Differences 








The findings shown in Tables IV - VII of this 
study reveal that 90 statistically significant differ- 
ences occurred at either the .01 percent or .05 
percent level. These differences were distributed 
as follows: 


1. Seventeen when the viewpoints of city super- 
intendents were compared with county supe r- 
intendents. 


. Twenty-three when the viewpoints of city 
principals were compared with county prin- 
cipals. 


. Six when the viewpoints of county superin- 
tendents were compared with county princi- 
pals. 


. Five when the viewpoints of city superintend- 
ents were compared with city principals. 


These data suggest that there is greater agree- 
ment among city administrators and among county 
administrators respectively than when city admin- 
istrators are compared with county administrators. 

When the viewpoints of principals having had ex- 
perience with vocational education were compared 
with those not having had such experience, it was 
found that significant differences were found for 39 
items. It should be further noted that 34 of the 90 
significant differences occurred for items dealing 
with vocational teacher competencies. Of the 90 
significant differences, 42 occurred for items deal- 
ing with trade and industrial education. With few 
exceptions, those principals having experience 
with vocational education expressed the more fa- 
vorable viewpoint. 


SECTION III 
SUMMARY AND GENERALIZATIONS 


This study was made as an effort to discover 
the viewpoints of secondary school administrators 
with regard to pertinent questions regarding voca- 
tional education in the secondary school. The in- 
quiry focused attention upon those principles, pol- 
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icies and practices which are most acceptable 
and least acceptable to school administrators and 
furnishes a basis for re-examining vocational ed- 
ucation in the secondary schools of Indiana. Al- 
though these data are based upon the viewpoints 
of Indiana school administrators, it is possible 
that the findings suggest implications to be con- 
sidered for vocational education programs else- 
where. 

The data-gathering instrument was a question- 
naire which included 112 items distributed as fol- 
lows: (1) 20 items on vocational education, (2) 30 
items on vocational agriculture, (3) 30 items on 
vocational home economics, and (4) 32 items on 
vocational trade and industrial education. The 
questionnaire provided an opportunity for respond- 
ents to indicate whether they strongly agreed, 
agreed, were undecided, disagreed, or strongly 
disagreed. Thedata collected was treated to show 
percentages of agreement, uncertainty and disa- 
greement with items in the questionnaire found 
among city superintendents, city high school prin - 
cipals, county superintendents and c ounty high 
school principals. The chi-square probabilities 
were tested to discover significant differences 
that might exist among secondary school adminis- 
trators when grouped by type of position held. 

On the basis of the administrators’ viewpoints 
expressed in this study the following gene raliza- 
tions may be made: 


_ 1. School administrators believe that providing 
" ~. opportunities for vocational educ ation is an 
. important responsibility of secondary educa- 
tion. 

2. Superintendents and secondary school princi- 
pals believe that successful programs of vo- 
cational education depend toalarge extent up- 
on the degree to which they encourage and 
support the program. 

3. While school administrators view themselves 
as having a key role in the development of vo- 
cational education programs, they favor hav- 
ing local advisory committees appointed to 
counsel with school administrators and teach- 
ers of vocational subjects. 

4. The cost of vocational education courses, al- 
though higher than for most subjects, is con- 
sidered justifiable by the majority of school 
administrators. 

5. School adm inistrators do not believe that vo- 
cational education programs create too many 
administrative problems. 





6. There seems to be a lack of understanding 
among school administrators regarding the 
_extent to which state and federal authorities 
regulate local school programs of vocational 
education. 

. The time allotment requirements for class 
instruction is a policy which administra- 
tors believe should not be determined by state 
and/or federal agencies. 

. School administrators favor having the state 
department of public instruction determine 
standards for approving the facilities for lo- 
cal departments of vocational education. 

. School administrators are not strongly op- 
posed to the use of federal funds for vocation- 
al education. 

. School administrators believe that vocational 
education courses should be elective. 

. School administrators do not believe or are un- 
certain about the responsibility of the sec on- 
dary school for providing vocational education 
opportunities for out-of-school youth and 
adults. 

. School administrators do not believe that 
bright students should be discourage from 
taking vocational education courses. 

. Opinions between city and county school admin- 

istrators regarding the socio-ec onomic level 
of pupils enrolled in vocational education 
courses are significantly different. 
School administrators view teachers of voca- 
tional subjects as being comparable to other 
teachers with respect totraining, professional 
attitudes and willingness to cooperate in school 
and community activities. 

. School administrators agree that home and 
farm supervisory visits are an integral part 
of the instructional program in vocational ag- 
riculture and vocational home economics. 

. School administrators view the Future Farm- 
ers of America and Future Homemakers of A- 
merica organizations as desirable co-curricu- 
lar activities for schools maintaining de pa rt- 
ments of vocational agriculture and homemak- 
ing. 

. The viewpoints expressed by secondary school 
administrators with regard to vocational trade 
and industrial education appear to be more 
closely associated with rural and urban factors 
than by the type of position held. 

. School administrators having had experience 
with vocational education programs tend to hold 
more favorable viewpoints than those not hav- 
ing had such experience. 
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THE PRESENT study presents evidence as to 
(a) the difference in attitudes among a selected 
group of undergraduate college women with refer- 
ence to the guidance of children, and (b) the effect 
upon these attitudes of an introductory course in 
child development and guidance examined in rela- 
tion to socio-economic status, intelligence, size 
of family, ordinal position, academic achievement, 
and perception of childhood happiness. 


The Subjects 


The students who served as subjects of the in- 
vestigation were 156 majors in the Division of 
Home Economics at Oklahoma Agricultural and 
Mechanical College. The students were divided 
into two groups, an experimental group and a con- 
trol group. The experimental subjects were en- 
rolled in an introductory course in child develop - 
ment and guidance. The control subjects had not 
completed or were not currently enrolled in this 
course. A summary description of the subjects 
is presented in Table I. 

The criteria for the selection of subjects were 
as follows: (a) white, (b) single, (c) reared in the 
United States, (d) 17-24 years of age, (e) female, 
(f) home economics major, and (g) American‘ 
Council on Education Psychological Examination 
completed. Students who had completed the course 
or who were enrolled in other child development 
and guidance courses were excluded from this in- 
vestigation because it was believed that their re- 
sponses might introduce a bias into the results. 

A comparison of the 76 experimental and the 
80 control subjects examined in relation to (a) 
scholastic aptitude as measured by the American 
Council on Education Psychological Examination, 
(b) year in school, (c) socio-economic status as 
measured by the McGuire-White Index of Social 
Status (Short Form), and (d) academic achieve- 








ment as measured by freshman grade point aver- 
age, evidenced no statistically significant differ- 
ences. 


Description of the Instruments and Rationale 
of Method 





Of the instruments which have been designed to 
assess attitudes concerning the guidance of chil- 
dren there are two of a paper-and-pencil nature 
whose adequacy seemed sufficient to warrant their 
use in this investigation. They are the University 
of Southern California Parent Attitude Survey de- 
veloped by Shoben (10) and the Child Guidance Sur- 
vey developed by Wiley (14). 

In the initial construction of the University of 
Southern California Parent Attitude Survey by Sho- 
ben, a scale of 148 items revealing attitudes con- 
cerning the guidance of children was presented to 
a group of 100 white, urban mothers, 50 of whom 
had problem children and 50 of whom had non-prob- 
lem children. The ‘‘problem’’ group consisted of 
children who (a) were receiving clinical help for 
some personality or behavior problem, or who 
(b) had come into the custody of the juvenile author- 
ities at least twice, or who(c) had a problem about 
which the child’s mother had registered a com- 
plaint indicating that she would like to have clinical 
help with her child if it were available, or if she 
could afford it. The ‘‘non-problem’’ group consist- 
ed of children who (a) had never received clinical 
attention, who (b) had never been taken into 
custody by juvenile authorities, and who (c) had no 
problem for which, in the opinion of the mother, 
clinical help was either desirable or necessary. 

In Shoben’s preliminary investigation those 
items which differentiated the two groups of 
mothers at the five percent level of confidence or 
beyond were retained. As a result of this proce- 
dure, 85 of the original 148 items were retained. 


* Adapted from a Ph.D. thesis completed at Florida State University. 


**The author wishes to express his gratitude to Dr. Ruth Connor for her guidance of this study, and to 
the Oklahoma A. and M. College Research Foundation for its support. 
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TABLE I 


DESCRIPTION OF SUBJECTS 





Experimental Control 
Description Classification (N = 76) (N = 80) 





Year in School Sophomores 67 66 
Juniors 9 14 


Ordinal Position Only 11 14 
Oldest 21 28 
Middle 19 19 
Youngest 25 19 


Number of Children 
in Family Two or less 37 39 
More than two 41 


Index of Social Status Upper 
Upper-middle 
Lower-middle 
Upper-lower 


Academic Achievement Freshman grade-point average 
A.C. E. Score Mean 


Residence Rural 
Urban 


Childhood Happiness 
Rating Very happy 
Happy 
Average 
Unhappy 
Very unhappy 
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The items appear in three subscales, i.e., Ignor- 
ing, Possessive, and Dominant. The Survey was 

then given to 40 mothers, equally divided between 

the problem and non-problem categories. The 

amount of shrinkage in terms of the magnitude of 

the correlation coefficients which serve as indices 

of the Survey’s validity was not excessive, and the 

measures of validity obtained from the second ad- 
ministration were as follows: Ignoring, .624; Po- 
sessive, .721; Dominant, .623; and for the Total 

Scale, .769. 

The Child Guidance Survey is a scale consist- 
ing of 160 items designed to assess attitudes con- 
the guidance of children. The Survey is composed 
of eight parts: (a) general home standards; (b) verb- 
al behavior; (c) expression of hostility; (d) wean- 
ing, thumb-sucking, and feeding; (e) toilet train- 
ing; (f) sexual behavior; (g) boy-girl differences; 
and (h) crying. 

Utilizing the responses of 172 subjects, meas- 
ures of reliability were obtained by Wiley (14) for 
each of the first seven parts of the Survey. In 
each instance the measure obtained was above .80. 
A measure of reliability was not.obtained for the 
eighth part, i.e., crying, because of its small 
number of items. 

In order to obtain a measure of the validity of 
the instrument, clinical judgments concerning the 
‘“sophistication’’ of the groups taking the test 
were made. For example, it was believed that if 
the test measured what it purported to measure, 
that experienced clinicians, persons who had been 
counseled with regard to their children’s prob- 
lems, and persons who had had the adv antage of 
special instruction in child development would be 
likely to express attitudes which were more favor- 
able than would those who had had little experi- 
ence with children. The group of 172 subjects 
whose responses were analyzed for this portion 
of Wiley’s study tended to bear out this hypothesis. 
A detailed description of the methodology em- 
ployed in the validation of this instrument has 
been presented elsewhere (14). 

The problem of whether attitude tests of a 
paper-and-pencil nature are sufficiently satisfac- 
tory to warrant their use in a serious investiga- 
tion is not new. The view is frequently expressed 
that questionnaires assess surface phenomena, 
and that many unconscious forces are left untapped 
by the use of such a technique. Yet the use of so- 
called depth techniques is not within the limits of 
many serious investigations of attitudes. Another 
important question which has not been satisfactor- 
ily answered concerns the relative merits of the 
paper-and-pencil questionnaire and the interview. 
Are the advantages of the interview sufficient to 
_ warrant its use in the large portion of research 
studies in which paper-and-pencil questionnaires 
are used? Ina study by Stouffer, et al. (11) ques- 
tionnaires and closed-end personal interviews 
were found to yield nearly identical information. 


The work of Metzner and Mann (7) and Kahn (3) 
also indicate remarkable similarity between re- 
sponses obtained by questionnaires and by open- 
end interviews. When a difference was observed, 
it was found that the responses to the question- 
naires were more highly predictive in terms of 
overt behavior than were responses obtained by 
the interview. But it must be recognized that 
many investigators believe that the use of the inter- 
view is superior to paper-and-pencil techniques 
in attitude measurement. In fact, the interview 
is not infrequently used as a criterion for rating 
the validity of questionnaires (1,13). However, 
Nye (9) indicates that the clinical interview has 
been uncritically accepted as a perfect instrument 
while the reliability and validity of paper-and- 
pencil tests have beencritically checked. Recent 
research by Kelly and Fiske (4) offers evidence 
which indicates the fallibility of the human observ- 
er in an interview setting. Too, the cost of the in- 
terview is such that in many instances it is prohib- 
itive as well as the fact that there are many in- 
stances in which itis not expedient to use the inter- 
view. For example, in studies measuring the ef- 
fectiveness of educational programs, as the study 
herein reported, it is important for all of the sub- 
jects to be tested immediately prior to the onset 
of the experiment. In such instances, question- 
naires are frequently used (2, 5, 8). 


Administration of the Instruments 





The University of Southern California Parent 
Attitude Survey and the Child Guidance Survey 
were administered to the students prior to the be- 
ginning of classes in September, and after the 
course endedin January. Although the instructors 
of the various experimental sections were aware 
that their students were participating in a research 
study, they were not given details of the investiga- 
tion, nor were they aware of the nature of the in- 
struments being used. The instructors were not 
present at the testing sessions. 

In order to conceal the identity of the scales, 
when they were mimeographed for student use, 
they were designated only as Inventory A and In- 
ventory B. Several students who were unable to 
complete the scales at the designated time com- 
pleted them by special appointment. All of the 
scales were machine scored. The scales were ad- 
ministered by the writer and by other staff mem- 
bers not engaged inteaching the introductory child 
development and guidance course. 


The Experimental Program 








The introductory course in child development 
and guidance required of all students majoring in 
the School of Home Economics at Oklahoma A. 
and M. College constituted the educational pro- 
gram to which the experimental group was ex- 
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posed. Each week inthe course, two fifty-minute 
periods were devoted to theory, two periods to 
laboratory, and one fifty-minute observation in 
one of the college nursery school-kindergarten 
programs is required of each student. Enroll- 
ment in the sections is maintained at approximate- 
ly 25 students. 

The class laboratory sessions are devoted to 
such experiences as showing films; discussing 
the individual children and their families with the 
nursery school teacher leading the discussion; an- 
alyzing case studies of children of preschool age; 
experimenting with creative media such as finger 
paint, clay, and easel paint; listening to chil- 
dren’s records; and reading and discussing chil- 
dren’s books. 

A rich variety of experiences is afforded the 
students in the nursery school-kindergarten pro- 
grams. Although the role thestudents assume is 
primarily that of observers, the students are in- 
cluded in activities whenever possible and are pro- 
vided such experiences as setting the tables for 
lunch, helping to prepare lunch, and eating with 
the children. Too, the students not infrequently 
are allowed to assist in taking the children on 
field trips; to help with projects when help is re- 
quested by the children; to read stories to the chil- 
dren; to play records; to prepare materials 
such as mixing finger paint, easel paint and dough 
clay; and to dress the children following their 
rest period. 

The text which was used in the course was the 
fifth edition of Rand, Sweeny, and Vincent’s 
Growth and Development of the Young C hild pub- 
lished by W. B. Saunders Company in 1953. Many 
supplementary readings were also used. 

The course is taught from a preparental ap- 
proach and emphasizes everyday problems, rou- 
tines, and typical responses of young children. 

In terms of course content, consideration is 
given to such factors as (a) basic needs and devel- 
opmental tasks of young children, (b) principles 
of development, (c) principles of guidance, (d) 
parent-child relationships, and (e) learning. At- 
tention is given to creative media suitable for 
young children, to the emotional and social devel- 
opment of young children, and to the importance 
of the family as a socializing influence. Such 
films as The Terrible Two’s andthe Trusting 
Three’s and The Frustrating Four’s andthe Fas- 
cinating Five’s are used. 

Classes are designed so that an exchange of 
ideas is encouraged. Classes are friendly and in- 
formal, and students are given considerable psy- 
chological support to encourage freedom of ex- 
pression of ideas for group consideration. Atten- 
tion is given to the application of research find- 
ings to everyday problems rather than to an anal- 
ysis of the values of specific methodology of re- 
search relating tochildren. Thefocus of the class 
is on everyday happenings, normal development 

















of children, and common forms of guidance. Little 
attention is given to the technical aspects of learn- 
ing theory; rather, considerationis given to learn- 
ing as observed through the eyes of parents. 
Because of the bias which might have been in- 
troduced into the data had the investigator taught 
one of the sections of the introductory course in 
childdevelopment and guidance, he was relieved 
of his teaching responsibility for the course while 
the investigation was in progress. Three instruc- 
tors assumed the responsibility of teaching four 
sections of the course comprising the experiment- 
al treatment. Although the instructors were aware 
that their students were participating in a research 
study, they were not given the details of the inves- 
tigation, nor were they aware of the nature of the 
instruments being used in the evaluation. 


Results 


The initial mean scores obtained on the USC 
Parent Attitude Survey and on the Child Guidance 
Survey by 156 home economic majors classified 
according to various subgroups are presented in 
Table II; the differences between mean scores of 
experimental andcontrol groups, in Table Ill; and 
the changes in scores between final and initial 
tests are presented in Table IV. 

The data indicate that the responses of the sub- 
jects on the USC Parent Attitude Survey were not 
unlike those obtained by Shoben’s (10) parents of 
‘‘non-problem’’ children. The responses of the 
subjects on the Child Guidance Survey were not un- 
like those obtained by Wiley’s (14) students in ed- 
ucation and public speaking classes, that is, his 
relatively ‘‘unsophisticated’’ subjects. 

The differences between the experimental and 
control groups at the onset of the experiment ap- 
peared to be small. As stated previously, differ- 
ences with respect to scholastic aptitude, year in 
school, 50 cio-economic status, and academic 
achievement were not statistically significant. Nor 
was there a significant difference between the 
mean scores obtained by the experimental and con- 
trol groups on the USC Parent Attitude Survey. 
However, atthe onset of the experiment the differ- 
ence between scores obtained on the Child Guid- 
ance Survey by the experimental and control 
groups was significant at the one percent level of 
confidence, indicating more favorable attitudes on 
the part of the experimental subjects at the time 
of the initial testing. A consideration of equating 
the two groups by an arbitrary exclusion of certain 
subjects was abandoned because of the possibility 
of disturbing the variance, thus biasing the results. 
Instead, it was decided to view the changes in re- 
lation to the obtained difference between the initial 
scores. 

Significant changes in attitudes concerning the 
guidance of children between initial and final tests 
were noted in both the experimental and control 





TABLE Il 


INITIAL SCORES OF STUDENTS 





USC Parent Attitude Child Guidance 
Survey Survey - 








Initial Level of Initial Level of 
Group N Mean Confidence Mean Confidence 





Experimental 76 333. 429. 01 
Control 80 338. = 449. . 


Upper-Middle 66 337. 436. 
Lower-Middle 69 333. sie 440. 


ACE: 50 percentile or above 53 331. 433. 
ACE: Below 50 percentile 103 338. : 443. 


Rural 72 335. 440. 
Urban 84 336. rr 439. 


Two children or less 76 337. 444. 
More than two children 80 335. ae 458. 


Only 25 329. 435. 
Oldest 49 337. : 445. 


Only 25 329. 435. 
Middle 38 335. ne 439. 


Only 25 329. 435. 
Youngest 44 338. hie 436. 


Oldest 49 337. 445. 
Middle 38 335. << 439. 


Oldest 49 337. 445. 
Youngest 44 338. ick 436. 


Middle 38 335. 439. 
Youngest 44 338.4 . TH) 436. 


GPA: 3 pt. or above 50 333. 437. 
GPA: Below 3 pt. 106 337. —s 440. 


Very happy 86 338. 444. 
Happy 36 339. aia 441. 


Very happy 86 338. 444, 
Average 30 325. : 429. 


Happy 36 339. 441. 
Average 30 325. 429. 
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groups on both instruments, suggesting that ma- 
turity and/or factors which were uncontrolled in 
the investigation may have contributed to changes 
in attitude. Although there was a difference of 
approximately 20 points at the time of the initial 
testing between experimental and control groups 
on the Child Guidance Survey in favor of the exper- 
imental group, at the end of the semester there 
was a difference of approximately 50 points be- 
tween experimental and control groups, indicat- 
ing significantly greater gains for the experiment- 
al group. 

Socio-Economic Status—The differences ob- 
tained between means of 66 upper-middle and 69 
lower-middle class students on the USC Parent 
Attitude Survey and on the Child Guidance Survey 
were not statistically significant. The numbers 
of students in the upper class and in the upper- 
lower class, in the present study were too small 
to warrant other comparisons. 

A difference between the initial means obtained 
on the Child Guidance Survey by the experimental 
and control subjects of the upper-middle class is 
significant at the one percent level of confidence, 
the experimental subjects having obtainedascore 
reflecting more favorable attitudes concerning 
the guidance of children. 

A comparison of the changes in scores between 
initial and final tests of experimental and control 
groups in the upper-middle and lower-middle so- 
cio-economic levels reflects significant gains in 
the upper-middle and lower-middle classes in the 
control group as evidenced by responses on the 
USC Parent Attitude Survey. The responses of 
the experimental subjects on the Child Guidance 
Survey, however, reflect gains significant at the 
one percent level of confidence in both the upper- 
middle and lower-middle socio-economic groups. 
The control subjects in the upper-middle class 
evidenced a gain significant at the five percent 
level of confidence. 

Scholastic Aptitude—The American Council 
on Education Psychological Examination which the 
students complete upon their entrance to Okla- 
homa A. and M. College was utilized as an index 
of scholastic aptitude. Students ranking at the 
fiftieth percentile or above on the A.C.E. Exami- 
nation obtained a significantly better mean score 
on the USC Parent Attitude Survey than did stu- 
dents below the fiftieth percentile. Responses on 
the Child Guidance Survey did not reflect a com- 
parable difference, however. 

Statistically significant differences were noted 
between mean scores obtained on both instruments 
by experimental and control students at the fif- 
tieth percentile or above on the A.C. E. Examina- 
tion, the experimental subjects indicating more 
favorable attitudes concerning the guidance of chil- 
dren. No significant difference, however, was 
noted between the initial mean scores obtained on 
either instrument by experimental and control 











subjects who were below the fiftieth percentile on 
the A.C. E. Examination. 

The changes in scores between initial and final 
tests of experimental and control groups of differ- 
ent levels of scholastic aptitude in general indi- 
cate significant gains in both experimental and con- 
trol groups irrespective of scholastic aptitude. 

Rural-Urban Residence— The initial mean 
scores obtained by students from rural and urban 
areas do not reflect a significant difference be- 
tween the two groups with respect to attitudes con- 
cerning the guidance of children. 

A comparison of the initial mean scores ob- 
tained by students from rural and urban areas in 
the experimental and control groups reveals no 
statistically significant differences on the USC 
Parent Attitude Survey. On the Child Guidance 
Survey, however, in both the rural and urban 
groups the experimental subjects obtained signifi- 
cantly lower scores, indicating more favorable 
attitudes concerning the guidance of children. 

The changes inscores between initial and final 
tests of experimental and control groups in gener- 
al indicate significant gains in both groups irre- 
spective of rural-urban residence. 

Size of Family— The difference between means 
of students who were reared in families of two or 
fewer children and students who were reared in 
families with more than two children were not sta- 
tistically significant on the USC Parent Attitude 
Survey. A difference in mean scores on the Child 
Guidance Survey, however, is significant at the 
five percent level of confidence, the students from 
smaller families indicating more favorable atti- 
tudes. 

The differences obtained on the Child Guidance 
Survey between the students in the experimental 
and control groups who were reared in families 
with more than two children is significant at the 
one percent level of confidence, the experimental 
subjects evidencing more favorable attitudes con- 
cerning the guidance of children. 

In general, the changes inscores on both instru- 
ments between initial and final tests of experi- 
mental and control subjects from families with 
two or fewer children and from families with more 
than two children indicate significant gains in both 
the experimental and control groups irrespective 
of the number of children in the family. 

Ordinal Position— Little relationship was noted 
between ordinal position and attitudes concerning 
the guidance of children. The initial mean scores 
obtained by students of. different ordinal positions 
indicate only one statistically significant differ- 
ence. Onboth the USC Parent Attitude Survey and 
the Child Guidance Survey significant differences 
were obtained between experimental and control 
subjects who were oldest children in their fami- 
lies, the experimental subjects evidencing the 
more favorable attitudes concerning the guidance 
of children. 
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An inspection of the changes in scores on the 
USC Parent Attitude Survey between initial and 
final tests of experimental andcontrol subjects of 
different ordinal positions fails to reveal a super- 
iority with respect to gain in the experimental 
group. On the Child Guidance Survey, however, 
~-only children, middle children, and youngest chil- 
dren in the experimental group evidenced signifi- 
cant gains while the control group subjects in these 
positions did not. 

Academic Achievement— The grade-point aver- 
age which the subjects earned during their fresh- 
man year incollege was utilized as an index of ac- 
ademic achievement. When the scores of the stu- 
dents with a grade-point average of 3 or above (a 
3 point being equivalent to a grade of B) were 
compared with those with a grade-point average 
below 3, no significant differences were noted 
with respect to attitudes concerning the guidance 
of children. 

No significant differences were noted between 
experimental and control groups for those stu- 
dents whose freshman grade-point average was 3 
or above. Statistically significant differences, 
however, were obtained between experimental and 
control groups for those students whose freshman 
grade-point average was below 3, the experiment- 
al group holding more favorable attitudes concern- 
ing the guidance of children. 

The changes in scores between initial and final 
tests of experimental andcontrol groups of differ- 
ing levels of academic achievement indicate, in 
general, significant gains in both the experiment- 
al and control groups irrespective of academic 
achievement as measured by the grade-point av- 
erage attained at the freshman level. 

Perceptions of Childhood Happiness— The ini- 
tial mean scores Obtained on the USC Parent Atti- 
tude Survey by students with different perceptions 
concerning the happiness of their own childhood 
indicate that students who perceive their childhood 
to have been ‘‘average’’ hold attitudes concerning 
the guidance of children whichare more favorable 
than do students who perceive their childhood to 
have been ‘‘happy’’ or ‘‘very happy.’’ Responses 
to the Child Guidance Survey, however, do not re- 
flect such differences. In general, the data do 
not reflect significant differences between experi- 
mental and control groups. 

The changes in scores between initial and final 
test of experimental andcontrol groups with differ- 
ent perceptions of childhood happiness indicate that 
there is little consistency between findings of the 
two instruments, with the exception of those stu- 
dents who rated their childhood to have been 
‘very happy.’’ The gains made by both the exper- 
imental andcontrol subjects who rated their child- 
hood to have been ‘‘very happy’’ were statistically 
significant. 

The validity of the happiness ratings obtained 
at the beginning of the semester is doubtful. When 











the responses on the happiness rating scales ob- 
tained at the beginning of the semester were com- 
pared with those obtained at the end of the semes- 
ter, the percentage of agreement bet ween the re- 
sponses of the experimental group was . 64, and 

the percentage of agreement between the re- 
sponses of the control group was .66. The data 

for the twogroups were treated separately in this 

regard because it was assumed that the percep- 
tions of the students enrolled in the child develop- 
ment class might change more than those of the 

students in the control group. The evidence, how- 
ever, does not support this assumption. 


Discussion 


Any interpretation with respect to the causes 
of the changes ‘in attitudes evidenced in the con- 
trol group must necessarily be regarded as specu- 
lation. There is the possibility that such gain re- 
flects mere maturation. Also, every research 
worker in the social sciences is acutely aware that 
in experimental work only a portion of the signifi- 
cant variables are adequately controlled. The 
problem, in part, is in knowing what factors are 
relevant as well as in being able to control them. 
In the present study, for example, even though 
one might logically assume randomization with re- 
spect to the courses in which the experimental and 
control subjects were enrolled, the fact remains 
that during the time the experimental subjects 
were enrolled in the introductory course in child 
development and guidance, the control subjects, 
for the most part, were enrolled inother courses 
carrying approximately the same credit. Undoubt- 
edly, some of these were home economics courses 
which although not specifically concerned with 
child development and guidance were ‘‘family cen- 
tered’’ in their approach. This may have contrib- 
uted to the modification of attitudes toward the 
guidance of children. 

In the study reportedherein, it is important to 
remember that the gain of the experimental group 
was significantly greater than the gain of the con- 
trol group as reflected by responses to the Child 
Guidance Survey. This indicates that the ‘‘exper- 
imental treatment’’ which consisted of an introduc- 
tory course in child development and guidance 
was particularly effective in producing the changes 
which were desired. There was little difference 
between experimental and control subjects in 
terms of amount of change in attitudes as meas- 
ured by the USC Parent Attitude Survey, however. 
But it should be noted that the initial scores ob- 
tained onthe USC Parent Attitude Survey compare 
favorably with those of parents of ‘‘non-problem’’ 
children reported by Shoben (10), and that they 
compare favorably with those of 207 undergradu- 
ate men reported by Walters and Bridges (12). 

The present study demonstrates that certain at- 
titudes concerning the guidance of children can be 
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modified during the course of a semester. In the 
opinion of the investigator, it reflects what may 
be achieved in schools throughout the country in 
which there is a sincere desire to promote the 
welfare of children. Since one of the purposes of 
education for family living is the modification of 
attitudes, it would seem that similar assessments 
of the attitudes of young people at the secondary 
level as well as of men and women in colleges and 
universities might well serve as an important ba- 
sis for curriculum planning. 
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JUDGMENT TEST 
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The Problem 


THE PURPOSE of this investigation is to study 
the predictive effectiveness of a teacher judgment 
test with particular reference to grade-point av- 
erage in professional courses, student-teaching 
grades and efficiency ratings by school supervi- 
sors given after six months of teaching. The study 
will concern itself with a so-called judgment test 
of a situational type, wherein certain facts are 
given and various sorts of judgments are sought. 
In making this approachitis thought that teachers 
must make many judgments in teaching and the 
quality of these judgments will correlate positive- 
ly with the effectiveness of teachers. 

The effectiveness of teachers will be measured 
by subjective evaluations through the Wisconsin 
Adaptation of the M- Blank by supervisors of 
schools. More specifically, this study will at- 
tempt to provide the evidence relative to the fol- 
lowing questions: 


1. Are scores on the judgment test predictive 
of success in professional courses? 

2. Are scores on the judgment test predictive 
of student-teaching grades? 

3. Are scores on the judgment test predictive 
of supervisors’ rating of teachers at the end 
of a six month period? 


Three criteria will be employed for predicting 
teaching success. They are: (a) grade-point av- 
erages in professional education courses, (b) stu- 
dent-teaching grades, and (c) principals’ or su- 
pervisors’ judgments on the effectiveness of teach- 
ers. During this investigation, subjective and ob- 
jective data will be used. It is hoped that these 
data can be used for predicting future teaching 
success. 


Importance of the Problem 





The current demand for teachers is by far 
greater than the supply available. This has not 





*All footnotes will be found at end of article. 





always been true. Twenty years agoa superintend- 
ent of schools could select from a large group of 
candidates, who applied for the teaching position, 
the one he thought would be best fitted for the job. 
This meant that the superintendent could select 
the candidate that promised to develop into a suc- 
cessful teacher, based on superior capacities, 
training, and apparent potentialities. Because it 
is true that most graduates of a teacher-education 
institution can find teaching jobs today, it is a ne- 
cessity that communities, schools, and teacher- 
education institutions assume joint responsibility 
to admit only those candidates for a future teach- 
ing career who possess superior qualifications, 
outstanding character traits and favorable person- 
al qualities. Herbert I. Von Haden! com mented 
on this subject in the following manner: ‘‘ Although 
the present under supply of teaching c andidates 
may make it appear that pretraining sel ection is 
not as urgent as it was when the supply of certifi- 
cate holders was more abundant, it might be ar- 
gued that existing conditions make iteven more 
necessary than formerly for training institutions 
to exercise greater care in the selection of their 
students. Certainly the ultimate welfare of the 
boys and girls who will come under the guidance 
and tutelage of the candidates in a future teaching 
position is fundamental and paramount. Despite 
the present critical shortage, those who are re- 
sponsible for long-term planning in education dare 
not lose sight of the need for the improvement of 
the quality of instruction in the schools and of the 
personnel to carry out that instruction. More val- 
id and reliable instruments of selection, then, are 
essential for the protection of boys and girls 
through the general improvement of the level of 
the teaching profession. The responsibility for 
assuring the highest possible quality of instruction 
rests jointly with the institutions training candi- 
dates, the agencies entrusted with the certifica- 
tion of teachers, and the administration charged 
with the selection of personnel. ’’ 


In 1935, Barr2 emphasized that one of the most 
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effective means for improving the quality of in- 
struction in schools is to admit only those of su- 
perior potential teaching ability. The questions 
then arise: 


1. Can teaching efficiency be predicted? 

2. Is achievement in various educational or 
professional courses an indication of later 
teacher success ? 

. What instruments will serve as an adequate 
basis for predicting success in teaching? 


School programs, teacher placement agencies 
and teacher-education institutions would be aided 
immensely if a valid instrument could be devel - 
oped for predicting teaching success. Because of 
the complexity of the te ac hing-learning process, 
however, no one instrument has yet been found to 
accomplish this desired goal. Many attempts 
have been tried to forecast the success of appli- 
cants for teaching positions. The most common 
procedures are to evaluate application blanks, 
questionnaires, college grades, physical and emo- 
tional fitness, and to observe the teacher at work. 
Even after a teacher is on the job, sound instru- 
ments and techniques of evaluating and measur- 
ing teaching effectiveness are of great aid for the 
in-service education of teachers. Materials, pro- 
cedures and methods need to be evaluated from 
time to time to keep abreast of modern develop- 
ments in the teaching field. To do this effective- 
ly, sound measuring devices of teaching ef ficien- 
cy are of utmost importance. 


Review of Previous Investigations 





In the past many studies have been made of the 
relationship of certain teacher competencies to 
various criteria of teaching success. Barr3 pub- 
lished in 1948 asummary of one hundred and 
thirty-eight such studies. There are five broad 
categories into which these studies can be classi- 
fied: 


1. General investigations. 

2. Pupil growth and change as a measure of 
teaching success. 

3. Pupil ratings of the teacher as a criterion 
of teaching success. 

. Supervisory ratings of the teacher as a meas- 
ure of teaching success. 

. Personal fitness of the teacher as measured 
by respective tests in the area of: attitudes 
knowledge of the subject matter, pers onal- 
ity and temperament. 


The studies which are categorized under 3 above 
and the studies that use judgment or intelligence 
tests of some sort as predictors of teaching effi- 
ciency will be reviewed. Investigations that use 
ratings of supervisors, grade-point average in 





professional courses and student-teaching grades 

as criteria were reported in this study. The nu- 
merous studies made in the past will not be cited 

here. The readeris referred to Barr’s article 

or to George J. Schick’s doctoral dissertation on 

file at the University of Wisconsin library. 

In the fall of 1955, teacher judgment tests were 
given to one hundred and forty-three seniors en- 
rolled in the School of Education at the University 
of Wisconsin. The population of this study con- 
sists of seventy-two teachers for whom s upervis- 
ory ratings could be obtained. Since no random 
sample is used in this study, no statistical infer- 
ences will be drawn; the study is a descriptive, 
predictive investigation. From the original one 
hundred and forty-three students, eighty-six se- 
cured elementary or high school teaching posi- 
tions for whom supervisory ratings could be ascer- 
tained. 

Student-teaching grades, grade-point average 
in professional courses, overall grade-point aver- 
age in four years of college, and psychological exam- 
ination scores were collected and tabulated for the 
teachers under consideration. 

The ‘‘so-called judgment test’’ was adminis- 
tered to the students. In making this approach it 
is thought that teachers must make many judg- 
ments in teaching and the quality of these judg- 
ments will correlate positively with the effective- 
ness of the teachers. Judgment is here defined as 
the ability to form an opinion or conclusion about 
a course of action from circumstances presented 
to the person. It isnot essential in this investiga- 
tion that one establishes the fact that the test em- 
ployed actually measures judgment since the pri- 
mary concern is with the predictive value of this 
instrument regardless of what it might claim to 
measure. 


Criteria of Teaching Efficiency 





Many criteria of teaching efficiency have been 
used in the past studies. Von Haden? discusses 
the criterion of ratings of teaching efficiency in 
the following way: ‘‘The evaluation of instruction 
upon the basis of information gathered through ob- 
servational devices often loses sight of the fact 
that what the teacher does may not be as signifi- 
cant as how he does it. There is alsothe possibil- 
ity that the whole is more than the sum of the parts 
in its effect upon the outcomes of instruction. 
Then too, a specific teaching act may not be good 
or bad in itself, but rather in relation to the total 
situation in which it is used—the conditions which 
give rise to it and the purpose for which it is em- 
ployed. This appropriateness factor is frequent- 
ly given inadequate consideration... 

**As in the case of the activities approach, the 
use of ratings of qualities is predicated upon the 
assumption that the teacher’s superior, his stu- 
dents, co-workers, or some other rater is ina 
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position to detect qualities and to eval uate them. 
The problems involved in determining the quali- 
ties to be considered and in objectifying the instru- 
ments employed in making the eval uations have 
long been recognized. The extent to which out- 
standing proficiency in one respect compensates 
for deficiency in another, has, however, not been 
established... Qualities, like procedures, may 
be closely dependent upon the situations that give 
rise to them as a consequent may have to be 
measured in specific situations. 

‘‘The complexity of the problem of establishing 
a criterion of teaching success is rooted in the 
varied nature of the teacher’s work and the conse- 
quent range of qualifications. ”’ 

While much has been achieved in this area it 
is still quite difficult to control the many factors 
aside from the teacher which influence the learn- 
ing processes of pupils. No highly valid and reli- 
able measures are yet available to measure pupil 
growth. 

Teacher ratings by supervisors have always 
been under attack as far as their validity and reli- 
ability are concerned. It is clearly recognized 
that different supervisors have different criteria 
on what constitutes teaching efficiency in differ- 
ent situations. But aslong as they are employed 
by their local board of education, as long as they 
represent particular communities, and as longas 
they judge what is acceptable or not acceptable in 
teaching, their ratings should be given serious 
consideration. 


The Criteria of Effectiveness Employed 


in This Study 


Three criteria will be employed in this investi- 
gation, namely, 





1. Grade-point averages as a measure of effec- 
tiveness in professional courses, 

2. Student-teaching grades as a measure of 
pre-service teaching effectiveness, and 

3. Principals’ or supervisors’ judgments or 
the in-service effectiveness of the teachers as 
measured by the Wisconsin Adaptation of the M- 
Blank. 


These criteria are not without shortcomings, 
but they constitute three almost universally used 
criteria of professional competency as judged on 
the basis of pre-service preparation and early 
teaching experience. 

This study involves only these three criteria. 
The claims of this investigation do not pertain to 
the general efficiency of teachers but to their effi- 
ciency as indicated by these criteria. Criterion 
1 will be used because proficiency in professional 
courses is judged important in teaching success. 
Criterion 2 will be used because student-teaching 


provides a pre-service opportunity to observe the 

teacher in action. Criterion 3 will be used be- 

cause the supervisors’ judgments are of great im- 
portance for the particular situation in which the 

teacher finds himself or herself. 

If principals or supervisors think that certain 
qualities make a good teacher, and if the success 
or failure of teachers depends upon the possession 
of them, then they are ultimately of essential im- 
portance to the teacher. If teachers are hired, 
promoted, evaluated and fired by supervisors who 
made judgments on their teaching efficiency it is 
of little value to the teachers to knowthat there ex- 
ists a more valid criterion that would reveal their 
superior ability in teaching. 


Statistical Design of the Study 








In the preceding section the data-gathering de- 
vices were explained. These data were analyzed 
through the use of the coefficients of correlation, 
mean, standard deviation, F-test, and in some 
cases, through partial correlation, multiple corre- 
lation and an analysis of variance. The judgment 
tests were scoredin three different ways. The 
group of seventy-two teachers for which responses 
could be secured was divided into random half A 
and random half B. Three scoring keys were de- 
rived. One was a ‘‘rational’’ key bas ed upon the 
answers given to the several test items by three 
professors and two advanced graduate students. 
There were two different empirical scoring keys, 
one derived from sub-group A and another from 
sub-group B, using the mean as weights. An addi- 
tional scoring key was derived from group A using 
the mean over the unbiased estimate of the popula- 
tion variance as weights. 

In the following section, empirical keying and 
cross-validation will be explained indetail. Cross- 
validation took place whenever a key from one 
group was derived, the second group was _ scored 
with it, and these scores were correlated with 
their corresponding M-Blank ratings. The cross- 
validation referred to he re was carried out with 
seventy-two cases for whom the data were com- 
plete. 

A total of one hundred and forty-three seniors 
took the judgment test in the fall of 1955. Of these, 
forty-two were not teaching in 1957, fifteen were 
in the Armed Forces and eighty-six secured teach- 
ing positions. Of the latter, eighty-six who were 
teaching seventy-two supervisory ratings on the 
Wisconsin Adaptation of the M-Blank could be ob- 
tained. This constituted eighty-four percent of 
the teachers for which follow-upletters were writ- 
ten and returns could be obtained. 


Empirical Keying and Cross-Validation 


It is believed that the use of empirical keying 
and cross-validation should be more generally ap- 
plied in prediction studies of this kind.5 As pre- 
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viously mentioned, 143 seniors enrolled at the Un- 
iversity of Wisconsin in the fall of 1955 took a 
teacher judgment test. Oneyearlater, when these 
seniors were teachers, follow-upletters were writ- 
ten to their supervisors and superintendents. The su- 
pervisors and superintendents were asked to give 
efficiency ratings of these teachers on the Wiscon- 
sin Adaptation of the M-Blank. Only those teach- 
ers for whom M-Blank ratings could be obtained 
were selected for this study. From the 72 teach- 
ers for whom M-Blank ratings could be obtained, 
random halves were used toderive empirical keys. 

All possible response options were listed for 
each item of the judgment test, including ‘‘omit- 
ted all options for that item’’ as an option itself. 
Then, for each option a weight was found. This 
was done by summing all the scores of M-Blank 
ratings given by supervisors of those teachers 
who marked a particular option of the test item. 
This derived sum was divided by the number who 
checked that particular option and thus the mean 
score of the M-Blank rating of those teachers who 
marked the mentioned option was found. In this 
manner a weight could be found for all marked op- 
tions. 

One would also need to know what weight an op- 
tion for a question should receive if it was not 
checked by anyone in group A but was checked by 
someone in group B. Inorderto answer this ques- 
tion, the following procedure was applied. The 
weights of all options marked were multiplied by 
the number of cases upon which they were based. 
The sum of all these products was then divided 
by the total number of cases, which made up the 
various option weights in a particular question. 
This newly derived weight was used for any and 
all options that were not checked. Thus, if option 
‘fomit’’ for a question had no weight, meaning 
that everyone checked some other option for that 
question, this newly derived weight was used for 
it. Why do weneeda weight for an option ‘‘omit’’ 
of a question if nobody really omitted the ques- 
tion? It is true thatfor that group from which the 
scoring key was derived a weight for an option 
‘‘omit’’ was not necessary. But we scored with a 
key derived from one group—a second group— 
and for that second group, a possibie answer to 
the above question was ‘‘omit’’ and a weight must 
be available if anyone in the second group actual- 
ly omitted that question. 

Random half A was used first in deriving the 
empirical scoring key using the mean as weights. 
Both groups A and B were scored with this key. 
Since the key was derived from group A, the 
scores of group Acontainedabias. The unbiased 
scores of group Bwere used for cross-validation. 
The second half was then used in deriving a differ- 
ent empirical scoring key and the first half was 
used for cross-validation. This process is known 
as double cross-validation and was used in this in- 
vestigation. The two halves are then independent 





of each other and a new group will not have to be 
found for cross-validation. This procedure saves 
time as generally a year is required before it is 
known who the new graduates are and which gradu- 
ates could be used for cross-validation purposes 
Moreover, new conditions, arising during the year 
following establishment of an empirical key, might 
nolonger make the two groups of graduates com- 
parable. The above procedure was achieved by 
scoring the second half with a key derived from 
the first random half and correlating these scores 
with their corresponding M-Blank ratings of super- 
visors. This coefficient of correlation was called 
rp: The biased scores of group A were also cor- 
related with their corresponding M-Blank ratings 
and the coefficient of correlation was called rg. 
The prime indicated it was biased and the sub- 
script identified the group. Thus, rp was theco- 
efficient of correlation of the cross-validated 
group B, or the unbiased validity measure for group 
B. A coefficient of equivalence—a measure of re- 
liability—was found in using Cronbach’s alphafor 
both groups. 

By looking at the various validity measures ra, 
ra, Tp, TR, which were actually computed one 
could ask the questions: What is the validity meas- 
ure for the combined biased groups? What is the 
actual bias of group A that resulted when group A 
was scored witha scoring key derived from group 
A as compared withthe unbiased, cross-validated 
value of ra? What is the actual bias of group B 
that resulted when group B was scored with a scor- 
ing key derived from group B as compared with 
the unbiased, cross-validated value of rp? 

In order to answer these questions it is evident 
that one cannot simply add the two cross-validated 
coefficients of correlation anddivide by two. They 
are based on different groups, with different means, 
and standard deviations. To combine rg andrp 
to one estimate, Fisher’s z-transformation was 
used. The reason for this transformation was as 
Fisher® points out: ‘‘The transformation leads ap- 
proximately to a normal distribution. The advan- 
tage of, this transformation lies in the distribution 
of the two quantities in random samples. The 
standard deviation of r depends on the true value 
of the correlation and the standard error of z is 
practically independent of the value of the correla- 
tion. 

*‘In the second place the distribution of r is 
skew in small samples, and even for large sam- 
ples it remains very skew for high correlations. 
The distribution of z is not strictly normal, but it 
tends to normality rapidly as the sample is in- 
creased, whatever may be the value of the corre- 
lation. The simple assumption that z is normally 
distributed will in all ordinary cases be sufficient- 
ly accurate. ’”’ 

A combination of the two independent unbiased 
validity coefficients rag and rg were found. This 
value was called rj. This was also computed for 
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TABLE II 


JUDGMENT TEST DATA SCORED WITH AN EMPIRICAL KEY 
USING THE MEAN AS WEIGHTS 





Group 


Key, 


Keyp 


Student Sums 





Group A 
Sum 


Squares 
Mean 


Standard 
Deviation 


43771 
53232297 


1216 


19.15 


40912 
46496482 


1136 


8.05 


84683 
199217739 


1176 





Group B 
Sum 


Squares 
Mean 


Standard 
Deviation 


43786 
53261712 


1216 


12.84 


41058 
46835856 


1140 


16. 27 


84844 
199974230 


1178 





Overall Sum 
Overall Squares 
Overall Mean 


Overall Standard 
Deviation 


Biased or Sequence 
Sum 


87557 
106494009 


1216 


16.20 


84829 


81970 
93332336 


1138 


12. 87 


84698 


169527 
399191969 


1177 
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TABLE IV 


BIASED AND UNBIASED SCORES OF THE JUDGMENT 
TEST USING THE MEAN AS WEIGHTS 





Scoring with Key Scoring with Key 
Derived Empiri- Derived Empiri- 
callyfromGroup A cally from Group B 





Biased Scores of Unbiased Scores of 
Group A Group A 


Unbiased scores of Biased Scores of 
Group B Group B 





TABLE V 


BIASED AND UNBIASED COEFFICIENT OF CORRELATIONS SCORING THE JUDG- 
MENT TEST IN USING THE MEAN AS WEIGHTS 





M-Blank Ratings M-Blank Ratings 
of Group A of Group B 





Biased Scores 
of Group A ra 


Unbiased Scores 
of Group A 


Biased Scores 
of Group B 


Unbiased Scores 
of Group B 
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the two biased ‘‘validity’’ coefficients r'y and rf. 
The improved estimate was called rg. The differ- 
ences ra - ra and rf - rp, the actual bias were 
also calculated. The corresponding reliability 
measures, © A, “'A, GB, andaR were also 
computed. 

Different means and standard deviations of bi- 
ased and unbiased scores were found. How do 
these different measures compare with each other? 
Is there a difference between groups, keys or bi- 
ased and unbiased scores? Toanswer these ques- 
tions a composite analysis of variance of the judg- 
ment test scores with an empirical key using the 
mean as weights was computed. This method was 
adapted to the procedure used by Stanley’. 

The procedure was used for the two random 
halves of group A and group B. Taking asubject 
at random from group A and one from groupB re- 
sults in the simplelatinsquare. As Stanley points 
out: ‘‘There will be half as many such latin 
squares as there are subjects inthe investigation, 
or as many as there are individuals in either 
group. It is more convenient to set upthe scores 
for analysis in the form of Table I, where the two 
groups are kept separate. This' is a consolida- 
tion of n latin squares. ’’ 

Table II gives all the scores of the two groups. 
Table III shows the procedure of analysis. For 
more detail the reader is referred to Stanley’s? 
article. 

It was found, as shown in Table III, that there 
was no difference between groups and no differ- 
ence between individuals within groups. But the 
key derived from group A versus the key derived 
from group B showed a very large significance. 
Immediately the question then arises why should 
that be the case if two random halves were used 
in determining the two keys? To answer this 
question let us look at the frequency polygons of 
scores of the two keys and one can readily see 
that the two distributions overlap only very little 
(in fact, only one score does). 

An F-test was computed and the differences of 
means between the two group distributions of 
scores was found to be highly significant. The 
variances for the two keys were tested previous- 
ly for homogeneity and also were found to be sig- 
nificant at the one percent level for the cross- 
validated groups as reported earlier. But why 
this should be so is, as yet, not explained. Key 
A resulted in much higher judgment test scores 
than key B. The data indicated that key A hada 
higher mean and standard deviation of M-Blank 
ratings than did key B. Is this difference in means 
of the supervisory rating statistically significant? 
A t-test was applied and the two means were found 
to differ appreciably but not significantly. (t= 
1.647 for 70 degrees of freedom.) Atest of homo- 
geneity of the two variances also resulted ina val- 
ue which was not significant. - This was in accord- 
ance with the assignment of the judgment test to 





two random halves. Nevertheless, there was a 
4.48 difference between the mean M-Blank of 
groups A and B. This meant that when the key de- 
rived from group A was used in scoring group A 
and group B (giving biased and unbiased scores) of 
the judgment tests, higher scores resulted than 
when the total group was scored withakey derived 
from group B. This was thenthe reason why such 
large differences in scores resulted for the total 
group and why the analysis of variance yielded 
such a highly significant vaiue between key A and 
key B. 

A second approach was made to illustrate that 
a still different empirical scoring key could be de- 
rived from the random halves. For this purpose 
an empirical key was derived only from group A. 
Instead of using the mean M-Blank rating for each 
option of every question as weights, the mean was 
divided by the unbiased estimate of the population 
variance of the distribution of M-Blank scores of 
those testees who checked a particular option. Val- 
idity and reliability measures were computed for 
the above scores. The 72 papers were also scored 
‘‘rationally’’. Three professors and two advanced 
graduate students derived a judgmental or rational 
key without regard to an external criterion. The 
familiar procedure of finding the proper weights 
for the various course grades via a z-transforma- 
tion was used. 


Conclusions 


The conclusions will be drawn with reference 
to the three questions which were asked at the be- 
ginning of the investigation. 


1. Are scores on the judgment test predictive 
of success in professional courses? 


The coefficient of correlation between the judg- 
ment test scores, using a ‘‘rational’’ key in scor- 
ing them with the total scores in professional 
courses, was found to be .30. This indicates that 
there is a positive relationship between the two 
variables of moderate size. The answer to ques- 
tion 1 is then: Scores on judgment test can be used 
for predicting success in professional courses. 
Since the coefficient of correlation is of only mod- 
erate size, a regression equation would yield pre- 
dictive values in professional courses which would 
be relatively inexact. 


2. Are scores on the judgment test predictive 
of student-teaching grades? 


The coefficient of correlation between judg- 
ment test scores, using a ‘‘rational’’ key in scor- 
ing them and the student-teaching grades was cal- 
culated to be r=.11. This indicated that there 
is a slight but insignificant relationship bet ween 
these two variables. 
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3. Are scores on the judgment test predictive 
of supervisors’ rating of teachers at the end 
of a six-month period? 


The coefficient of correlation bet ween the lat- 
ter two variables was computed to be r=.21. 
This value reveals a positive but statistically in- 
significant relationship between the two variables 
under consideration. The established correlation 
coefficient of r = .21 is again too small to be of 
any practical predictive value. But this-r of .21 
is about as large (or as small) as the r between 
M-Blank and grade-point average. This finding 
indicates that other variables would give as good, 
or even better, predictive measures. 

If all possible response patterns to the various 
question options would have been studied for 
group A as wellas group B, an explanation might 
have been possible why the ‘‘rationally’’ keyed 
judgment test scores correlated higher with the su- 
pervisory ratings than did the cross-validated 
groups. If there are k options in a question, and 
the (k - 1) option is the option ‘‘omit all items”’, 
then 1+ (K) + ()+ ai +249) patterns are pos- 


sible, where the second to the last term are sim- 
ply the binomial coefficients. 

While positive relations were found between 
the various variables appearing inthe above three 
questions one must conclude that the relations 
were too small to be of practical predictive value. 
It is, however, possible that with the revised 
teacher judgment test higher correlation coeffi- 
cients and thus better predictive measures could 
be obtained. The revision was made by several 





professors and advanced graduate students. 

Further research is needed in this area to es- 
tablish the long sought after goal to find a meas- 
ure that predicts early and adequately later teach- 
ing success. Perhaps, however, the M-Blank 
ratings for groups heterogeneous with respect to 
major fields and teaching location are inherently 
unpredictable. 
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AN INVESTIGATION OF SECURITY-INSECUR- 
ITY AND ACHIEVEMENT-BOREDOM IN 
ELEMENTARY SCHOOL CHILDREN 


EARL W. KOOKER 
North Texas State College 
Denton, Texas 


Introduction 


IN THE PAST several years, the literature 
concerned with the elementary school child has 
increasingly emphasized the relationship of the 
child’s personality adjustment to his behavior in 
the school situation. Indiscussions of the content 
and methods of instruction, for example, various 
concepts such as ‘‘feelings of insecurity’’ are 
often introduced to account for differential behav- 
ior of children in a given situation. 

This study is concerned with an attempt to 
formulate a method of measuring four adjustment 
variables which appear frequently in the litera- 
ture: ‘‘feelings of security, ’’ ‘‘feelings of insecur- 
ity,’’ “‘feelings of achievement’”’ (he re-used to 
mean that a child feels the activities in which he 
is engaged are worthwhile) and feelings of bore- 
dom. 

In the present investigation, the procedure of 
using an observer’s check list was selected as the 
method for defining these concepts. Itseemed to 
the investigator that this procedure tended to elim- 
inate several of the difficulties often encountered 
in ‘‘self-rating’’ scales. 

A test of the predictive usefulness of the meas- 
ures developed in this study was made by investi- 
gating the hypothesis that a relationship exists be- 
tween amount of tardiness in school and the child’s 
rank on the achievement- boredom measure. 


Statement of the Problem 





The study involved the investigation of two hy- 
potheses. One is concerned with the development 
of procedures for identifying the four ‘‘feeling 
states’’ mentioned above; the second has to do with 
the relation between two of these defined vari- 
ables and amount of tardiness in school. 








The first hypothesis can be divided into three 
parts: 


1. That professionally trained observers will 
be able to consistently assign behavior de- 
scriptions to specified categories described 
in terms of inferred ‘‘feeling states. ’’ 


On the condition that sub-hypothesis ‘‘1’’ is 
sustained, the items defined in that opera- 
tion can be scaled by the method of succes- 
Sive intervals and can be utilized by an ob- 
server in a rating situation. 


. On the condition that sub-hypothesis ‘‘2’’ is 
sustained, that the scale scores will be 
stable for different observers and for re-rat- 
ings by the same observer. 


The second hypothesis is: Children possessing 
different scores on the achievement-boredom scale 
will show differences in the incidence of tardiness 
behavior in school. 


Procedures 


To test sub-hypothesis ‘‘1’’, descriptions of a 
variety of children’s school behavior patterns, 
which have been suggested as indicative of one of 
the four feeling states, were obtained from the lit- 
erature, conversations with teachers, and obser- 
vations of children in school. The behavior pat- 
terns were limited to school behavior since the 
completed rating forms were to be used by an ob- 
server in the school situation. Eighty-eight such 
items were developed, each item consisting of 
a lead sentence followed by a supplementary de- 
scription of the behavior andthe situation in which 
it occurred. For example: 

The child who asks to be reassured by the teach- 


*This is a summary of a dissertation submitted in partial fulfillment of the Ph. D. to the Graduate School 


at the State University of Iowa in February 1951. 


The writer wishes to thank the co-chairmen of his 


committee, Dr. Ralph H. Ojemann and Dr. Harold Bechtoldt, for their g uidanc e in the carrying out 


of the study. 
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er concerning his academic work. 

He asks what his grade is, how well he’s 
doing in his work, whether he’s doing it right, 
etc. This refers to work which has been as- 
signed and complete instructions given. Even 
if he istoldthat he is doing well or satisfactor- 
ily, he continues toaskfor reassurance. This 
occurs especially after others or the teacher 
have emphasized the necessity for doing well 
in his work. 


The list of items was submitted to six gradu- 
ate students in the departments of psychology and 
child welfare for classification into seven cate- 
gories, six of which were feeling states (insecur- 
ity, aggression, achievement, security, submis- 
sion, boredom) and the seventha ‘‘none’’ category. 
The judges were asked to evaluate each item by: 


1. Checking ‘‘none’’ if they felt the item indi- 
cated none of the states listed, 

2. Single checking all feelings which they felt 
the item indicated, 

3. Double checking the feeling which they felt 
the item indicated most clearly relative to those 
listed, 

4. Triple checking a feeling if they felt the item 
indicated it more clearly than any such state they 
could recall. 


The criterion adopted for the acceptance of an 
item for use in the second phase of the study was 
that at least four of the six judges must have 
double- or triple-checked the same category for 
that item. 

As a final step in this phase of the study, the 
eighty-eight items were submitted to two clinical 
instructors in clinical psychology with the same 
instructions as those given to the graduate stu- 
dents. 

To test sub-hypothesis ‘‘2’’ regarding the scale- 
ability of the items selected by the judges, twenty 
graduate students in education, psychology and 
child welfare were asked to scale the items by the 
methods of successive intervals.1* This meth- 
od was chosen because it is less time consuming 
than some of the other scaling methods, yet it 
offers a test of internal consistency. 

In the list of items presented to the judges, 
each item appeared three times, each appearance 
specifying a different frequency of occurrence of 
the behavior described. The three degrees were 
represented by the phrases ‘‘frequently,’’ ‘‘fairly 
often’’ and ‘‘seldom.’’ This was done in anticipa- 
tion of giving raters three choices of frequency of 
occurrence in the final rating scale. 

The security-insecurity items were scaled on 
one continuum and the achievement-boredom on 





*All footnotes will be found at end of article. 





another. The judges were asked to consider both 
the frequency and type of behavior in placing the 
the item in one of eleven categories. Qh the secur- 
ity-insecurity continuum, category one represent- 
ed the highest degree of security and eleven the 
highest degree of insecurity. On the achievement- 
boredom continuum, category one represented the 
highest degree of achievement and category eleven 
the highest degree of boredom. 

As a result of this step each variationof each 
item was assigned a scale value proportionate to 
the degree of security-insecurity or achievement- 
boredom the judges felt the item indicated. 

In the third phase of thestudy, rating-rerating 
data and between-observer ratings of groups of 
children were obtained. The rerating data were 
obtained at intervals of two weeks. In all ratings 
the achievement-boredom and security-insecurity 
items were randonly presented. The raters were 
told not to discuss the items with one another and 
were not informed of the feeling states on which 
ratings were being obtained. They were instruct- 
ed to indicate one of three degrees of frequency 
of occurrence (‘‘frequently, ’’ ‘‘fairly often,’’ and 
‘*seldom’’) for each item, choosing the degree 
which in their opinion best characterized the child. 

Data were obtainedfrom threeschools. Schools 
A and C were midwest college experimental 
schools. School B was a midwest consolidated 
school. 

In April, all the children in the sixth grade (12 
boys and 14 girls) in School A were rated and re- 
rated at an interval of two weeks by an outside ob- 
server and the regular teacher. In November, 
ratings and reratings, by the regular teachers, 
were obtained for two sixth grades in School B 
(ten boys and nine girls in one group and 13 girls 
and five boys in the other). In December ratings 
by two observers were obtained for a fifth-grade 
group in School C (ll boys and eight girls). Be- 
cause of the organization of School C it was neces- 
sary to use the present teacher as one observer 
and the teacher who had taught the class the pre- 
vious year and summer as the other. 

Using the data from the second between-ob- 
server ratings of School A, items with the highest 
between-observer consistency were selected, us- 
ing Guttman’s (3) test of item reliability: The 
items retained as having satisfactory reli ability 
(an average lower estimate of .40 and average up- 
per estimate of .50) were used to calculate inter- 
rater and intra-rater reliability coefficients 
(Pearson’s r). In these calculations the remain- 
ing ratings were used. 

To determine whether the continua ‘‘security- 
insecurity’’ and ‘‘achievement-boredom”’ were bi- 
polar in nature, separate security and insecurity 
scores and separate achievement and boredom 
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scores were obtained for all four groups of sub- 
jects listed above. To obtain a score for each 

child, ‘‘frequently’’ was given a value of one; 
‘*fairly often, ’’ two; and ‘‘seldom,’’ three. Cor- 
relations were then calculated bet ween security 
and insecurity scores and between achievement 
and boredom scores for the four groups mentioned 
above. 

The final phase of the study was concerned with 
testing the hypothesis that if a child feels bored 
with the school, he will delay as long as possible 
entering the school situation and thus will show a 
higher frequency of tardy behavior than a child 
who feels he is achieving something significant in 
school. 

This relationship was investigated by using the 
Mann-Whitney (4) technique to test the signifi- 
cance of the difference in the incidence of tardi- 
ness between pupils in the upper and lower half of 
the achievement-boredom score distribution. A 
Statistical test could be made only in the group 
from School A since it was the only school which 
recorded tardiness at the beginning of each class 
period and thus the only group which had a suffi- 
cient frequency of recorded tardiness to justify a 
Statistical test. In this group the outside observ- 
er’s rating was used to categorize the pupils on 
achievement-boredom. The entire years’ tardi- 
ness records were employed. 


Results and Discussion 





The results will be presented in the same or- 
der as were the procedures. 

Of the eighty-eight items submitted to the 
judges, forty-nine satisfied the criterion that four 
of six judges must have double or triple checked 
the same feeling for that item. Twenty-one of the 
items were achievement-boredom items and 
twenty-eight were security-insecurity items. 
Twenty-four of the forty-nine items had been 
double or triple checked by four judges; nineteen 
by five judges; six by all six judges. 

The two clinical instructors agreed with the 
six graduate students to the following extent: 


1. In twenty-three of the selected items, both 
instructors double or triple checked the same cat- 
egory as the judges. 

2. In nineteen cases, one of the instructors 
Couble or triple checked the same category. 

3. In seven instances neither of the instructors 
double or triple checked the same category as the 
judges, but infive of these atleast one did specify 
that some degree of the feeling was indicated. 


Thus the results indicated that some agree- 
ment could be secured as to the ‘‘class name’’ 
which should be associated with the items, though 
the agreement was by no means perfect. 

The twenty-one ac hievement-boredom items 





and twenty-eight security-insecurity items select- 
ed in the first phase of the study were the items 

scaled by the twenty judges. In the method of suc- 
cessive intervals, items are said to meet the test 

of internal consistency if the estimates of the in- 
terval boundary points obtained from overlapping 

discriminal processes are approximately equal. 

The standard deviation; of these estimates were 

calculated for each interval boundary point. This 

was done for both the achievement-boredom and 

security-insecurity scale. 

On the achievement-boredom scale this stand- 
ard deviation ranged from .20 for boundary point 
number eight to .06 for boundary point number 
two witha mean and median of .14 for the ten 
boundary points. Onthe security-insecurity scale 
this standard deviation ranged from . 34 for bound- 
ary point number one to .02 for boundary point 
number nine witha mean for the ten sigmas of . 11 
and a median of .05. The two relatively large val- 
ues of .34 for boundary point one and .29 for 
boundary point two accounted for this difference 
between mean and median values. 

Mosier’s (5) shortcut method of scaling pro- 
vides a method for estimating the discriminal dis- 
persions of each item in terms of the scale units. 
In this study, eighty-three percent of the security- 
insecurity items had dispersions of less than 1.50 
and eighty percent of the achiev ement-boredom 
items had dispersions of less than this value. 

These data indicate that it is possible to scale 
this type of item with considerable consistency. 
In interpreting these results, it should be recog- 
nized that more stable estimates of the boundary 
points and scale values might have been obtained 
if a larger number of judges had been used. Fur- 
thermore, in this investigation the lines used for 
estimating the sigmas and intercepts were fitted 
by inspection and the estimates so obtained may 
differ slightly from lines fitted by a method such 
as least squares. These factors were recognized 
in planning the study but it was felt that tests of the 
scaleability of the items and usefulness of such 
concepts in a relational study should be carried 
out before the additional labor of procuring more 
judges and using more precise methods of calcula- 
tion were undertaken. 

In the third phase of the study, the application 
of Guttman’s (3) procedure resulted in the sel ec- 
tion of seventeen achievement-boredom items. 2 

The scaled scores of the thirty-six items, se- 
lected in step three, were usedto calculate rating- 
rerating coefficients and between-observer coeffi- 
cients. These correlations for the various sam- 
ples are presented in Table I. 

It will be noted that the rating-rerating coeffi- 


’ cients are fairly satisfactory, all approaching or 


being greater than . 90 with the exception of rater 
1’s rating-rerating on the security-insecurity 
scale. This may have been due to the relatively 
short time she spent in observing the children be- 
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fore her first rating. 

The between-observer correlations are not as 
high as may be desirable. How much of an effect 
the rating conditions had on this correlation is a 
matter for further investigation. InSchool A, the 
outside observer did not have the same freedom 
to move about, hear what the children said and 
have as continuous association with the children 
as did the teacher. InSchoolC group both raters 
were not observing the children at the same time. 

On the other hand, it should be recognized that 
some of the agreement between teachers in rating 
this group could have been due to the child’s rep- 
utation following him from one grade to the next. 
Though they were told nottodiscuss their ratings 
it was, of course, impossible to determine the ef- 
fects of previous discussions. 

In order to determine whether there was a sig- 
nificant difference among these reliability coeffi- 
cients, a test of homogeneity described by Rider 
(8) was employed. The test was applied to the 
rating-rerating correlations for both scales and 
to the between-observer correlations for both 
scales. In the group from School A the outside 
observer’s rating-rerating was omitted since it 
could not be considered to be independent of the 
teacher’s rating of the same children. In no in- 
stance could the hypothesis of homogeneity be re- 
jected at the five percent level; therefore, mean 
correlations, using Fisher’s ‘‘Z’’ technique, were 
calculated with the following results: 


1. The mean r for rating-reratings on the S-I 
scale was .92. 

2. The mean r for rating-reratings on the A-B 
scale was . 94. 

3. The mean r for between-observer ratings 
on the S-I scale was . 69. 

4. The mean r for between-observer ratings 
on the A-B scale was .71. 


All of these mean r’s were significantly different 
from zero. 

In addition to the correlations ‘between ratings, 
another factor to be considered'in investigating 
the objectivity and stability of scale scores is the 
difference between mean scores. The signifi- 
cance of the difference between alJ rating-rerating 
means and between-observer means was tested 
by the use of ‘‘t’’ for related measures. The re- 
sults of applying this test appears in Table II. 

An inspection of Table II reveals that the small- 
est change in means tended to occur in the group 
from School A. Perhaps this can be partially ex- 
plained by the fact that the raters in this group 
had more opportunity to become familiar with the 
items on the scale and with the children to be rated. 
The items were available tothem for a longer 
period of time than was true for the other raters 
—several weeks as compared to about two weeks. 
In School A the ratings were made in the spring 





and the other groups in the fall. It will be recalled, 
too, that one of the two raters inSchool C was not 
observing the children at the time the ratings were 
made. 

A longer training period in preparation for the 
ratings may be desirable, more explicit directions 
than were given might be helpful andfurther study 
of item objectivity may be necessary. These re- 
Sults along with those on the correlations suggest 
that the scales, intheir present form, may be use- 
ful in studies concerned with the relationship of 
scale scores to other variables but cannot reliably 
be used in normative studies. 

In the next phase of the study, the correlations 
between achievement and boredom frequency 
scores and between security and insecurity fre- 
quency scores, provided some evidence to indicate 
that these pairs of feelings tended to represent 
continuua in the behavior of children. There was 
a tendency for children showing a high frequency 
of achievement behavior to show a low frequency 
of boredom and vice versa. A similar relation- 
ship was evident between the frequency of security 
and insecurity behavior. The correlations for the 
four samples are presented in Table III]. The 
teacher’s ratings were used in each group. The 
four security-insecurity. and the four achievement- 
boredom correlations were tested for homogeneity 
and in neither case could the hy pothesis of homo- 
geneity be rejected atthe five percent level. The 
mean correlation for security-insecurity was -.74. 
Both of these correlations are significantly diifer- 
ent from zero. 

Next, the relationship between the security-in- 
security scaled scores and the achievement-bore- 
dom scores was investigated by correlating these 
scores from the same four ratings used above. 
These correlations are presented in Table IV. 

A mean correlation of .78 resulted when these 
four correlations were combined after being test- 
ed for homogeneity. This correlation suggests 
that achievement-boredom and security-insecurity, 
as here defined, are rather closely related in chil- 
dren’s behavior. How much of the relationship 
between the two scales is a function of ‘‘halo ef- 
fect’’ is not clear. Though the teachers were not 
told on what variables they were rating, it is pos- 
sible that they used some sort of ‘‘good-bad’’ 
standard when making both sets of ratings. Per- 
haps using trained outside observers as raters, 
since they would not be ego-involved inthe type of 
behavior children display in school, would throw 
further light on the usefulness of maintaining the 
two sets of traits as separate concepts. 

In the final phase of the study, and analysis of 
the relationship between achievement-boredom 
and incidence of tardiness revealed that there was 
a significant difference in tardiness between those 
pupils in the upper and lower halves of the achieve- 
ment-boredom distribution. In School A, the av- 
erage frequency of tardiness for those inthe upper 
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TABLE II 


PRODUC T-MOMENT CORRELATIONS BETWEEN SECURITY AND INSECURITY 
FREQUENCY SCORES AND BETWEEN ACHIEVEMENT AND 
BOREDOM FREQUENCY SCORES 





Sample 


School B School B 
Variables School A School C Group 1 Group 2 
Correlated N = 26 N=19 N= 18 N=19 








Security and 
Insecurity 


Achievement and 
Boredom 





TABLE IV 


CORRELATIONS BETWEEN ACHIEVEMENT-BOREDOM AND 
SEC URITY-INSECURITY 





Sample 


School B School B 
School A School C Group 1 Group 2 








Correlation . 85 47 79 .79 
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half was 3.77 with a sigma of 5. 35 and in the low- 
er half the mean was 8. 62 andthe sigma was 5.75. 
This difference was significant at the two percent 
level as tested by the Mann-Whitney (4) technique. 

Tardiness records were also obtained from the 
two samples in School B, but there were too few 
incidences of tardiness recorded to justify a test 
of significance. In these schools the records 
were available only to December first and tardi- 
ness was recorded only once in the morning. In 
one group there were only three frequencies re- 
corded, all in the lower half of the achievement- 
boredom distribution. In the other group there 
was a total of fourteen recorded, twelve of which 
were in the lower half of this distribution. 


Summary and Conclusions 





This study was concerned with: (1) the devel- 
opment of a procedure for defining and identifying 
the feeling states of security, insecurity, achieve- 
ment and boredom, and (2) an investigation of the 
relationship between achievement-boredom scores 
and tardiness behavior in school. 

The following results were obtained: 


1. Considerable agreement c ould be obtained 
among professionally trained judges as to 
which behavior patterns can be used to de- 
fine each of the four states under investiga- 
tion. 

. The security-insecurity and ac hievement- 
boredom items selected in ‘‘1’’ satisfactor- 
ily met a test of internal consistency when 
scaled by the methodof successive intervals 
on an eleven-point.continuum. 

. The mean rating-rerating correlations for 
the scales was .93 and the mean between- 
observer correlation was .70. Significant 
differences between rating-rerating means 
and for between-observer means indicated 
a need for a longer training period in prep- 
aration for making ratings, some modifica- 
tions of some items. 

. Negative correlations between achievement 
and boredom frequency scores and between 
security and insecurity scores suggested 
that these two states represented a continua 
in the behavior of children. Appreciable 
positive correlations between security-inse- 
curity and achievement-boredom scores, 
raised some question as to usefulness of 
maintaining the two as separate concepts. 





5. Children rated high in a feeling of achieve- 
ment were tardy less (significantly so in the 
group in which a test could be made) than 
those designated as being bored with school. 


A list of the items comprising the security-in- 
security scale together with the scale values is 
given in Appendix A (see original dissertation). 
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The Field of Accounting 





PUBLIC ACCOUNTING is a comparatively 
young profession and is a rapidly growing one. 
From August 1940 to November 1956, the Ameri- 
can Institute of Accountants, whichis the national 
organization of public accountants, increased in 
membership from 5437 to 28,535, or more than 
500 percent. It is estimated that there are now 
in the United States approximately 19,000 certi- 
fied public accountants in public practice. The 
total number of accountants, including those in 
public and private practice and government ac- 
counting, but excluding those doing routine book- 
keeping work, is estimated to be more than 200,000. 

The importance of accounting in our financial 
structure is evident to everyone. Modern busi- 
ness could not function without accountants in both 


public and private practice. The accountant stands 


in a unique role of trust. He is the final arbiter 

of the financial condition of many thousands of bus- 
iness organizations ranging from small private 

enterprises to giant corporations and including 

profit-making, non-profit, and governmental or- 
ganizations. It is essential that men of high abil- 
ity and unassailable integrity be attracted to, 
trained for, and retained in this profession. 


Brief History of Accounting Tests 





Objective testing was almost unknown in the 
accounting field until about fifteen years ago. In 
1943, the American Institute of Accountants, in 
cooperation with a number of large accounting 
firms, began a measurement project as part of 
a larger program of improvement of the selection 
of personnel for public accounting. Dr. Ben D. 
Wood of Columbia University was ap pointed pro- 
ject director, and the Educational Records Bur- 
eau was designated as the operating organization 
for the project. After about two years of explor- 
atory work intest construction and try-out, the 
testing program was placed on a service basis 
starting in 1946 and 1947. Since that time, test 
materials and services have continuously been 
available to colleges and to employers in public 





*All footnotes will be found at end of article. 





accounting firms and business and industrial organ- 
izations. 

From the beginning, recognition was given to 
the imperative need to emphasize the use of the 
tests in the selection and appraisal of young men 
for college training in the accounting field. In or- 
der to encourage college use of the tests, the 
charge to colleges has been set below actual cost, 
which fact has made it necessary for the program 
to be partly subsidized by the Institute. The 
materials and service charges to employers are 
more substantial. 

A recent development in the accounting testing 
program has been the provision of counseling in- 
struments for use at the high school senior level. 

A continuous research program is carried on in 
connection with the project. More than thirty ar- 
ticles reporting research have been published by 
the project office and the Institute, and a consider- 
able amount of research onthe tests has appeared 
in other places, such as in Doctor’s and Master’s 
theses. 


Kinds of Tests Developed and Used 





The tests used in this project are of three kinds: 
1) tests of aptitude or orientation toward the ac- 
counting field, 2) achievement tests, and 3) meas- 
ures of interests. It has not been found possible 
to include a fourth area, personal qualities, as a 
part of the regular testing program for account- 
ants, although the importance of this area is rec- 
ognized, and much informal appraisal of various 
aspects of the personality of their employees is 
done by public accounting firms. 

Orientation Test— The Orientation Test, avail- 
able for use throughout all college years and in em - 
ployment situations, is a test of mental ability 
based on materials appropriate to the field of busi- 
ness. It provides a verbal score derived from a 
vocabulary subtest and a reading subtest, a quani- 
tative score based upon business arithmetic prob- 
lems, and a total score. There are three forms 
of this test, each of which requires fifty minutes 
of working time. 

In addition to the college level Orientation Test, 
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an Accounting Orientation Test for High School 
Seniors has been available since 1953. It exists 
in two forty-minute forms, each of which covers 
accounting vocabulary, arithmetic reasoning, and 
simple accounting problems for whichno previous 
study is required. 

Achievement Tests—There are two levels of 
the accounting Achievement Tests. LevellI, which 
exists in three forms each calling for two hours 
of working time, is planned for use with students 
who have had at least one full year of accounting 
or the equivalent. It may also be used with 
second-year students. Form A yields a total score 
based on questions in the following areas—account 
classification, accounting vocabulary, arithmetic 
of comparative profit and loss statements, enter- 
ing and posting, bank reconciliation, adjustment 
in ten-column worksheet, analysis of depreciation 
histories, and tracing the effect of errors. The 
other two forms contain subtests which are sim- 
ilar to those in Form A but not identical with 
them. Recently, three fifty-minute forms, with 
a less extensive coverage, have also been made 
available. 

The Achievement Test, Level I, is int ended 
for use at the end of the senior year in collegeor 
with applicants for employment or employed ac- 
countants. It is available intwo four-hour forms 
and two two-hour forms. Form A, oneof the four- 
hour tests, contains the following parts: funda- 
mental classification relationships, entering tran- 
sactions in books of original entry, posting books of 
original entry, analysis of adjustments, anal ysis 
of comparative operating statements of branches, 
cash record and bank reconciliation, analysis of de- 
preciation histories, tracing the effect of errors, 
inventory methods, influence of inventories on net 
profit, comparison of inventory methods, and au- 
diting. Thetwo-hourforms containfewer ques- 
tioms on accounting and none on auditing. Each of 
the four forms yields one overall score. It is’ possi- 
ble to obtain part scores, as well, but, since these 
are not very reliable, no norms have been estab- 
lished for them. 





Interests—Early in the project, a study was 
made of the Strong Vocational Interest Blank for 
Men on the basis of blanks filled out by more than 
two thousand public accountants in the United 
States and more than one thousand in Canada, The 
findings were used in developing typical profiles 
of interests on twenty-seven occupational scales, 
including accountant and CPA, for the public ac- 
countants in each country and for different levels 
of employed accountants, such as junior, semi- 
senior, senior, manager, and partner. A special 
form was devised so that the vocational interest 
profile of an individual could be compared with 
the general pattern of the interests of men in the 
profession, and this form is used in reporting re- 
sults for all persons who take the Strong blank in 
this program. 


When the Committee on Accounting Personnel 
under whose auspices the project is carried on, 
turned its attention tothe provision of instruments 
for use in counseling at the high school level, the 
need for interest measurement at that level was 
recognized. However, since the Strong blank is 
not as well suited for use with high school pupils 
as with college students and adults, it was decid- 
ed to experiment with the development of special 
profiles for public accountants on the Kuder Pref- 
erence Record-Vocational and the Kuder Prefer- 
ence Record-Personal. In 1952, both these forms 
were filled out by 578 practicing members of the 
American Institute of Accountants representing 
different levels of employment, various sizes of 
firms, a large age range, and a wide geographi- 
cal distribution. A study was made of the results 
for a group of 516 accountants who were satisfied 
with their work and a group of sixty-two account- 
ants who said that they were not satisfied, and typ- 
ical profiles for the satisfied accountants were 
established on both preference records. These 
profiles were printed and made available so that 
high school guidance personnel could use them in 
counseling pupils about their patterns of Kuder in- 
terests as compared with the interests of account - 
ants. 


College Accounting Testing Program 





The project office provides a service program 
for the Orientation, Achievement, and interest 
tests. The program has two broad as pects—re- 
ferred to as the College Accounting Testing Pro- 
gram and the Professional Accounting Test- 
ing Program. 

The college program was begun in the academ- 
ic year 1946-47, andthetests have been available 
to the colleges each fall and spring since that 
time. In 1951, testing at midyear was added to 
this program. 

All Orientation and Achievement Tests used in 
the college program must be returned to the pro- 
ject office for scoring, statistical analysis, and 
a report of the results in terms of raw scores and 
percentiles basedon national norms. The charge 
for the test material and service is fifty cents a 
test. Bulletins summarizing the results in parti- 
cipating colleges, and occasionally containing re- 
search studies of the tests, are issued after each 
testing program. 

In the spring of 1956, 219 colleges adminis- 
tered a total of about fifteen thousand tests in con- 
nection with this program. Since the beginning of 
services to the colleges, about five hundred and 
twenty-five colleges have participated, and the 
total number of tests administered to date is ap- 
proximately 315, 000. 


Professional Accounting Testing Program 








The tests used in the professional testing pro- 
gram, for men outside college, are the Orienta- 
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tion Test, the Level II Achievement Test, Form 

A (four hours), and Form C (two hours), and the 

Strong Vocational Interest Blankfor Men. The In- 
stitute has established forty regional offices in 

cities throughout the United States for the admin- 
istration of these tests. In addition, accounting 

firms and business and industrial or ganizations 

may administer the tests to members of their own 

staffs or applicants for positions, provided they 

arrange to have one of their staff members certi- 
fied by the project office to serve as examiner. 
At present, there are 217 certified examiners 

aside from those in the regional offices. 

Either local scoring or project office scoring 
may be used inthe professional program. Where 
local scoring is used, the charge for the test ma- 
terials is $2.50 per individual for the Orientation 
Test, $2.50 for the Level II Achievement Test, 
and 10 cents for the Strong blank. If the tests are 
returned to the project office for scoring, the 
charge for material and service is $5.00 for the 
Orientation Test, $5.00 for the Achievement Test, 
and $2.00 for the Strong blank, or $12.00 per in- 
dividual for the complete battery. When central 
scoring is used, individuals may receive upto ten 
copies of their scores and percentiles punched in- 
to special IBM cards at no extra charge. These 
cards provide an official record which may be 
used in employment interviews. 

Since testing in the professional program is 
ordinarily on an individual or small-group basis, 
the volume of the professional program is a good 
deal smaller than that of the college program. Up 
to December 1, 1956, about 25,000 tests had been 
given in the professional program. 


Program for the High School 





In the college and professional programs, the 
tests are never sold outright to users but remain 
under the control of the Committee on Accounting 
Personnel at all times. This enables the project 
office to make sure that copies of the tests are not 
left in the hands of students or others who might 
gain an unfair advantage infuture administrations 
the same forms of the tests. Such precautions 
are necessary, since these tests may be used in 
screening for entrance to the study of accounting 
in college or for employment. 

In the case of the High School Acc ounting Ori- 
entation Test, however, it is not necessary to use 
the same kind of safeguard, since the uses of this 
test are not for selection but for guidance and 
counseling. So, the high school orientation test 
may be purchased by schools and kept on hand 
for use with individuals or groups as needed. The 
test is usually scored at the school, although if 
project office scoring is desired, it may be ob- 
tained at a cost of 20 cents a test, which includes 
a report of the results. Thus far, the use of the 
high school orientation test has been small, al- 





though the societies of CPA’s inanumber of states 
have indicated an interest insponsoring the use of 
this test as a guidance instrument in the high 
schools of their states. Since July 1953, about 
8600 of these tests have been distributed for use 
in high schools. 


Norms 


Rather extensive percentile norms are avail- 
able for interpretation of the results of the account- 
ing tests. In the college program, fall, midyear, 
and spring norms have been established on the Or- 
ientation Test for one, two, three, and four years 
of accounting study. There arealso fall, midyear, 
and spring norms for one, two, andthree years of 
study on the Level I Achievement Test and for 
two years of study, three years of study, and grad- 
uating seniors on the Achievement Test, Level II. 

In the professional program, employed account- 
ant norms were set up for the Orientation Test, 
Form A, and Achievement Test, Level Il, Form 
C, on the basis of a special staff testing program 
carried on by accounting firms in the spring of 1950. 
There are also some norms for employed account- 
ants on Form A of the Level II Achievement Test. 

The two forms of the High School Accounting 
Orientation Test, Forms S and T, are accompa- 
nied by percentile norms for public high school 
seniors. These are spring norms for high school 
seniors in general, without regard to course of 
study. Thus far, separate norms for commercial 
course students have not been established on this 
test. 


Reliability and Validity 





There is more information on the rel] iability 
and validity of the accounting tests than can be re- 
ported in detail in this study, but an attempt will 
be made to summarize the data. 

Reliability—Most of the reliability data are 
Spearman- Brown odd-even correlations based on 
the scores of college students. All these are de- 
rived from results for students at a given level of 
study. The medians of the reliabilities are ap - 
proximately as follows: Orientation Test, Ad- 
vanced Level, verbal score, .90; quantitative 
score, .80; total score, .91; Orientation Test, 
High School Level, vocabulary, .87, arithmetic 
reasoning, .76; accounting problems, . 78; total 
score .91; Achievement Test, Level I, two-hour 
form, .94; one-hour form, .89; Achievement Test 
Level II, four-hour form, .97; two-hour form, .88. 

These reliability coefficients are about the 
same size as those reported for tests of aptitude 
and achievement inotherfields. The total scores 
of all these tests seem reliable enough for use in 
the appraisal, selection, and guidance of individu- 
al students, although the one for the two-hour 
form of Achievement Test, Level Il, is perhaps a 
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little lower than might have been expected. Con- 
siderable time is required to obtain a reliable 
measure of achievement in the accounting field be- 
cause of the need for the examinees to do a consid- 
erable amount of reading about the test situation 
on which the questions are based. 

Validity—Evidence concerning the validity of 
the accounting tests is of three general kinds: 1) 
correlations with school or college grades, 2) cor- 
relations with CPA Examination grades, and 3) 
correlations with criteria of success in employ- 
ment. 

Coefficients of correlations bet ween the vari- 
ous tests and course grades may be summarized 
as follows: 


Orientation Test, Advanced Level, versus 
Course Grades in Nine First-Year Accounting 
Classes Distributed among Nine Institutions: med- 
ian r, verbal score, .33; quantitative score, .43; 
total score, .43. 

Orientation Test, High School Level, versus 
Grades in Bookkeeping Class in One High School, 
vocabulary, .46; arithmetic reasoning, .59, arith- 
metic problems, .49; total score, . 59. 

Achievement Test, Level I, Two-Hour Form, 
versus Grades in Thirty-One First-Year Account- 
ing Classes Distributed among Seventeen Institu- 
tions, median r, total score, .59. 

Achievement Test, Level I, Two-Hour Form, 
versus Grades in Six Second-Year Classes in 
Four Institutions, median r, total score, .54. 

Achievement Test, Level Il, versus Grades in 
Six Advanced Accounting Classes in Five Institu- 
tions (Senior Year), median r, total score, . 54. 


All these correlations are significantly posi- 
tive, with those for the Achievement Tests tend - 
ing to run somewhat higher than those for the Or- 
ientation Tests. The correlations seem about as 
high as could be expected, in view of the rather 
low reliability of course grades and the fact that 
a variety of qualities and aspects of behavior of 
students help to determine grades assigned by in- 
structors. In one university, it was found that 
large differences between rank in class based on 
course grades and rank derived from Achieve- 
ment Test score could usually be explained when 
the individual’s background and record, including 
extra-curricular activities, were carefully 
studied. 

The second kind of criterion used, the CPA Ex- 
amination, is an important basis for state certifi- 
cation of accountants to engage in public practice. 
Nearly all states use for this purpose a series of 
essay examinations prepared by an examining 
board appointed by the American Institute of Ac- 
countants and scored at the Institute by a trained 
group of graders. The examinations are a valu- 
able criterion for use in studying the predictive 
value of the objective accounting tests. 





Several studies have been made of the relation- 
ship betweenscores on the accounting tests admin- 
istered in college and grades on the CPA Examina- 
tions taken some years later. | 

The following are medians of an extensive ser- 
ies of correlations: Orientation Test versus CPA 
Examinations, verbal score, .37; quantitative 
score, .42; total score, .46; Achievement Test, 
Level Il, versus CPA Examinations, .54. These 
correlations are about the same size as those be- 
tween the tests and grades in courses studied. In 
view of the time interval between the taking of the 
tests and the CPA Examinations, the correlations 
are rather favorable to the validity of the tests. 

The ultimate criterion for appraising the worth 
of a set of professional tests is success in per- 
forming the work of the profession. It is difficult, 
however, to appraise the value of tests for predic- 
tion of success on the job because the validity of 
the criteria of success is open to question. As in 
studies of many other tests, ratings of supervisors 
were the most readily available criterion of job 
success, and these were used in a number of 
studies of the accounting tests, even though rat- 
ings vary a great deal fromfirm to firm and from 
one supervisor to another. Inthe most comprehen- 
sive of these studies, median correlations between 
accounting tests and ratings in thirteen firms ona 
scale which included quality of work, quantity of 
work, knowledge of accounting, ability to learn, 
dependability and integrity, initiative and responsi- 
bility, cooperation, and overall value to the organ- 
ization were as follows: Orientation Test versus 
ratings, verbal score, .29; quantitative score, .37; 
total score, .36; Achievement Test, Level II, ver- 
sus rating, .55. Perhaps because of differences 
in the care and accuracy with which the ratings 
were done, the correlations varied a great deal ac- 
cording to the firm from which they were obtained. 
The median cor relation between test scores and 
ratings ran as low as .19 in one firm and as high 
as .74 in another. In general, scores on the CPA 
and accountant scales of the Strong blank added 
little to the prediction based on the Achievement 
and Orientation Tests. 2 

Another criterion of job success used ina re- 
cent study inone accounting firm is a salary index 
based on increase in salary during a five-year 
period.3 Correlations of test scores with salary 
index for a group of ninety -eight accountants in 
this firm were as follows: Achievement Test, 
Level Il, versus salary index, .33; Orientation 
Test, total score, versus salary index, .22; 
Strong blank (CPA scale) versus salary index, .25. 

The multiple correlation of Level II Achieve- 
ment, Orientation total, and Strong blank with 
salary index was .40 for this group. 

These correlations indicate that the test scores 
are somewhat relatedtosuccess in accounting em- 
ployment situations but that many qualities other 
than those measured by the tests also enter into 
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job success. 


Some Outcomes of the Testing Program 





In summary, the contributions of this testing 
program to the field of accounting include the fol- 
lowing: 


1. Reasonably reliable and valid tests of apti- 
tude and achievement have been devised and made 
available to schools and colleges and to employers 
for use in selection and upgrading of personnel 
for the accounting field. Accounting norms have 
been established for these tests, as well as for 
two widely used inventories of vocational interests 
and one of personal preferences. 

2. Continuous services are available to schools, 
colleges, accounting firms, and business and in- 
dustrial organizations for scoring, statistical an- 
alysis, and reporting of results of the accounting 
tests in terms of national norms. 

3. A record ofthe results of the tests for each 
individual examined is maintained in a permanent 
file in the project office, where it may be drawn 
upon at any time. These scores are regarded as 
the property of the individual concerned and are 
released only upon his written authorization. 

4. Research on the values ofa variety of tests 
for prediction of success in college and employ- 
ment has been carried on and is being continued. 

5. Finally, a considerable amount of interest- 
ing and useful research data is being accumulated 
about the nature of the personnel of the account- 
ing profession. For example, equating of scores 
on the accounting Orientation Test and the ACE 
Psychological Examination and stucy of the ACE 
equivalents of Orientation Test medians for first- 
year students in several colleges indicate that stu- 
dents planning to enter accounting are a little 





above the national average in verbal aptitude and 

much above average in numerical aptitude. 4 In 

other words, the accounting field seems to be at- 
tracting fairly able students, although there is a 

need for further increase in the ability of those 

going into the field. 


The most extensive and dependable compari- 
sons are in the fields of interests and personal 
preferences, for it is in these areas that nation- 
ally used instruments have been applied to the ac- 
counting field. On the Strong blank, the interests 
of public accountants agree well with those of ac- 
countants and CPA’s, as would be expected, and 
also with those of production managers, purchas- 
ing agents, bankers, and personnel managers. 
They do not correspond with the interests of art- 
ists, ministers, or psychologists.9® On the Kuder 
Preference Record- Vocational, the average pub- 
lic accountant is high in computational, clerical, 
and literary activities but comparatively low in 
social service, outdoor, and mechancial activi- 
ties. The Kuder Preference Record-Personal sug- 
gests that the average public accountant has some 
preference for being active in groups and likes to 
work with people and have new experiences.§ Thus 
the typical public accountant emerges as a com- 
paratively intelligent, active, flexible, and soci- 
able individual who likes detailed computational 
work but also work in which he can give verbal ex- 
pression to his findings. 

While this testing program seems helpful to the 
accounting field, it has by no means reached its 
maximum usefulness. There are needs for im- 
proved tests, better norms for certain groups, 
further research on validity, and greater under- 
standing and use of these measurement devices at 
both the educational and employment levels. 
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